Abstract
This document introduces XAI methodology, referred to as XPLAIN, to address the challenges of LLM interpretability by developing a perturbation-based normalized vector similarity metric to identify and quantify word level importance of the prompt provided to an LLM on its generated output. By analyzing which words or phrases most significantly influence an LLM's decisions, the described techniques enhance the explainability and interpretability of large language models. The approach perturbs the prompt via masking words to generate several outputs through the LLM and then compares the outputs with the generated output of the unperturbed prompt through vector-based similarity at a semantic level. After this, the word level importance of each masked word in the perturbed sentences is mathematically derived and referred to as the XPLAIN metric. Using the metric, a comprehensive analysis of LLM behavior across three distinct paradigms: task-specific performance including question-answering, math-based Q/A, coding analysis, logical understanding; cross-model comparison; and multilingual capabilities is performed. This helps explain biases and reasoning methodologies of these models. The described methodology contributes to the broader goal of creating more transparent and trustworthy AI, with implications for improving model design, mitigating biases, and enhancing the reliability of LLM applications across various domains.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Recommended Citation
Dhar, Gopala and Devi, Sharmila, "XPLAIN: XAI for Interpretable LLMs through Perturbation Analysis and Normalized Vector Similarity", Technical Disclosure Commons, (June 24, 2025)
https://www.tdcommons.org/dpubs_series/8273