Large Language Models (LLMs) are increasingly used as powerful tools for a plethora of natural language processing (NLP) applications. A recent innovation, in-context learning (ICL), enables LLMs to learn new tasks by supplying a few examples in the prompt during inference time, thereby eliminating the need for model fine-tuning. While LLMs have been utilized in several applications, their applicability in explaining the behavior of other models remains relatively unexplored. Despite the growing number of new explanation techniques, many require white-box access to the model and/or are computationally expensive, highlighting a need for next-generation post hoc explainers. In this work, we present the first framework to study the effectiveness of LLMs in explaining other predictive models. More specifically, we propose a novel framework encompassing multiple prompting strategies: i) Perturbation-based ICL, ii) Prediction-based ICL, iii) Instruction-based ICL, and iv) Explanation-based ICL, with varying levels of information about the underlying ML model and the local neighborhood of the test sample. We conduct extensive experiments with real-world benchmark datasets to demonstrate that LLM-generated explanations perform on par with state-of-the-art post hoc explainers using their ability to leverage ICL examples and their internal knowledge in generating model explanations. On average, across four datasets and two ML models, we observe that LLMs identify the most important feature with 72.19% accuracy, opening up new frontiers in explainable artificial intelligence (XAI) to explore LLM-based explanation frameworks.
翻译:大型语言模型(LLMs)正日益成为众多自然语言处理(NLP)应用的强大工具。一项最新创新——上下文学习(ICL),使LLMs能够通过在推理时在提示中提供少量示例来学习新任务,从而无需进行模型微调。尽管LLMs已在多种应用中得到使用,但其在解释其他模型行为方面的适用性仍相对未被探索。尽管新的解释技术层出不穷,但许多技术需要模型的“白盒”访问权限和/或计算成本高昂,这凸显了对下一代事后解释器的需求。本研究提出了首个框架,用于研究LLMs在解释其他预测模型方面的有效性。具体而言,我们提出了一个包含多种提示策略的新颖框架:i)基于扰动的ICL,ii)基于预测的ICL,iii)基于指令的ICL,以及iv)基于解释的ICL,这些策略包含关于底层机器学习(ML)模型及测试样本局部邻域的不同信息量。我们利用真实世界基准数据集进行了大量实验,证明LLM生成的解释在性能上与最先进的事后解释器相当,这得益于它们利用ICL示例及其内部知识生成模型解释的能力。平均而言,在四个数据集和两个ML模型上,我们观察到LLMs识别最重要特征的准确率为72.19%,这为探索基于LLM的解释框架开辟了可解释人工智能(XAI)的新前沿。