Post Hoc Explanations of Language Models Can Improve Language Models

Large Language Models (LLMs) have demonstrated remarkable capabilities in performing complex tasks. Moreover, recent research has shown that incorporating human-annotated rationales (e.g., Chain-of- Thought prompting) during in-context learning can significantly enhance the performance of these models, particularly on tasks that require reasoning capabilities. However, incorporating such rationales poses challenges in terms of scalability as this requires a high degree of human involvement. In this work, we present a novel framework, Amplifying Model Performance by Leveraging In-Context Learning with Post Hoc Explanations (AMPLIFY), which addresses the aforementioned challenges by automating the process of rationale generation. To this end, we leverage post hoc explanation methods which output attribution scores (explanations) capturing the influence of each of the input features on model predictions. More specifically, we construct automated natural language rationales that embed insights from post hoc explanations to provide corrective signals to LLMs. Extensive experimentation with real-world datasets demonstrates that our framework, AMPLIFY, leads to prediction accuracy improvements of about 10-25% over a wide range of tasks, including those where prior approaches which rely on human-annotated rationales such as Chain-of-Thought prompting fall short. Our work makes one of the first attempts at highlighting the potential of post hoc explanations as valuable tools for enhancing the effectiveness of LLMs. Furthermore, we conduct additional empirical analyses and ablation studies to demonstrate the impact of each of the components of AMPLIFY, which, in turn, lead to critical insights for refining in-context learning.

翻译：大型语言模型在执行复杂任务方面展现出卓越能力。此外，最新研究表明，在上下文学习过程中融入人工标注的原理（例如思维链提示）能显著提升这些模型的性能，尤其在需要推理能力的任务上。然而，此类原理的融入在可扩展性方面面临挑战，因为这需要大量人工参与。在本研究中，我们提出了一种新颖框架——利用事后解释的上下文学习增强模型性能（AMPLIFY），通过自动化原理生成过程来解决上述挑战。为此，我们采用可输出归因分数（解释）的事后解释方法，该分数能捕捉每个输入特征对模型预测的影响。具体而言，我们构建了自动化的自然语言原理，将事后解释的见解嵌入其中，为大型语言模型提供修正信号。基于真实数据集的广泛实验表明，我们的AMPLIFY框架在各类任务（包括那些依赖人工标注原理（如思维链提示）的先前方法表现不佳的任务）中，预测准确率提升了约10-25%。本研究率先揭示事后解释作为提升大型语言模型有效性工具的潜力。此外，我们通过额外实证分析和消融实验，论证了AMPLIFY各组件的贡献，从而为优化上下文学习提供关键洞察。