Large Language Models (LLMs) are increasingly used as powerful tools for a plethora of natural language processing (NLP) applications. A recent innovation, in-context learning (ICL), enables LLMs to learn new tasks by supplying a few examples in the prompt during inference time, thereby eliminating the need for model fine-tuning. While LLMs have been utilized in several applications, their applicability in explaining the behavior of other models remains relatively unexplored. Despite the growing number of new explanation techniques, many require white-box access to the model and/or are computationally expensive, highlighting a need for next-generation post hoc explainers. In this work, we present the first framework to study the effectiveness of LLMs in explaining other predictive models. More specifically, we propose a novel framework encompassing multiple prompting strategies: i) Perturbation-based ICL, ii) Prediction-based ICL, iii) Instruction-based ICL, and iv) Explanation-based ICL, with varying levels of information about the underlying ML model and the local neighborhood of the test sample. We conduct extensive experiments with real-world benchmark datasets to demonstrate that LLM-generated explanations perform on par with state-of-the-art post hoc explainers using their ability to leverage ICL examples and their internal knowledge in generating model explanations. On average, across four datasets and two ML models, we observe that LLMs identify the most important feature with 72.19% accuracy, opening up new frontiers in explainable artificial intelligence (XAI) to explore LLM-based explanation frameworks.
翻译:大型语言模型(LLMs)正日益成为众多自然语言处理(NLP)应用中的强大工具。近期的一项创新——上下文学习(ICL),允许LLMs通过在推理时的提示中提供少量示例来学习新任务,从而无需微调模型。尽管LLMs已在多项应用中得以采用,但其在解释其他模型行为方面的适用性仍相对未探索。尽管新的解释技术不断涌现,但许多技术需要模型的白盒访问权限和/或计算成本高昂,这凸显了对新一代事后解释器的需求。在本研究中,我们提出了首个用于探究LLMs解释其他预测模型有效性的框架。更具体地说,我们提出了一个包含多种提示策略的新框架:i)基于扰动的ICL,ii)基于预测的ICL,iii)基于指令的ICL,以及iv)基于解释的ICL,这些策略涉及关于底层机器学习模型及测试样本局部邻域的不同信息级别。我们利用真实世界基准数据集进行了大量实验,结果表明,LLM生成的解释在性能上与最先进的事后解释器相当,这得益于它们利用ICL示例和内部知识生成模型解释的能力。平均而言,跨四个数据集和两个机器学习模型,我们观察到LLMs识别最重要特征的准确率达到72.19%,这为探索基于LLM的解释框架开辟了可解释人工智能(XAI)的新前沿。