The increasing complexity of LLMs presents significant challenges to their transparency and interpretability, necessitating the use of eXplainable AI (XAI) techniques to enhance trustworthiness and usability. This study introduces a comprehensive evaluation framework with four novel metrics for assessing the effectiveness of five XAI techniques across five LLMs and two downstream tasks. We apply this framework to evaluate several XAI techniques LIME, SHAP, Integrated Gradients, Layer-wise Relevance Propagation (LRP), and Attention Mechanism Visualization (AMV) using the IMDB Movie Reviews and Tweet Sentiment Extraction datasets. The evaluation focuses on four key metrics: Human-reasoning Agreement (HA), Robustness, Consistency, and Contrastivity. Our results show that LIME consistently achieves high scores across multiple LLMs and evaluation metrics, while AMV demonstrates superior Robustness and near-perfect Consistency. LRP excels in Contrastivity, particularly with more complex models. Our findings provide valuable insights into the strengths and limitations of different XAI methods, offering guidance for developing and selecting appropriate XAI techniques for LLMs.
翻译:随着LLM复杂性的不断增加,其透明度和可解释性面临重大挑战,这需要使用可解释人工智能(XAI)技术来增强其可信度和可用性。本研究引入了一个综合评估框架,包含四项新颖指标,用于评估五种XAI技术在五种LLM和两项下游任务中的有效性。我们应用该框架,使用IMDB电影评论和推文情感提取数据集,评估了LIME、SHAP、积分梯度、逐层相关性传播(LRP)和注意力机制可视化(AMV)这几种XAI技术。评估聚焦于四个关键指标:人类推理一致性(HA)、鲁棒性、一致性和对比性。我们的结果表明,LIME在多个LLM和评估指标上始终获得高分,而AMV则表现出卓越的鲁棒性和近乎完美的一致性。LRP在对比性方面表现突出,尤其是在更复杂的模型中。我们的研究结果为不同XAI方法的优势与局限性提供了有价值的见解,为开发和选择适用于LLM的XAI技术提供了指导。