Post Hoc Explanations of Language Models Can Improve Language Models

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of-Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, leads to critical insights for refining in-context learning.

翻译：大型语言模型在完成复杂任务方面展现出卓越能力。近期研究表明，在情境学习中加入人工标注的推理依据（如思维链提示）可显著提升模型性能，尤其对需要推理能力的任务效果显著。然而，此类推理依据的引入因需要大量人工参与而面临可扩展性挑战。本文提出新型框架AMPLIFY（通过利用情境学习与事后解释增强模型性能），通过自动化生成推理依据的流程来解决上述挑战。具体而言，我们利用输出归因分数（解释信息）的事后解释方法，捕捉各输入特征对模型预测的影响。我们构建了嵌入事后解释见解的自动化自然语言推理依据，为大型语言模型提供纠偏信号。基于真实数据集的广泛实验表明，AMPLIFY框架在广泛任务中可使预测准确率提升约10-25%，甚至包括那些依赖人工标注推理依据（如思维链提示）的传统方法难以胜任的任务。本研究首次揭示了事后解释作为增强大型语言模型有效性工具的潜力。此外，我们通过额外的实证分析与消融研究，展示了AMPLIFY各模块的实际影响，进而为优化情境学习提供了关键洞见。