Elucidating the rationale behind neural models' outputs has been challenging in the machine learning field, which is indeed applicable in this age of large language models (LLMs) and in-context learning (ICL). When it comes to estimating input attributions (IA), ICL poses a new issue of interpreting which example in the prompt, consisting of a set of examples, contributed to identifying the task/rule to be solved. To this end, in this paper, we introduce synthetic diagnostic tasks inspired by the poverty of the stimulus design in inductive reasoning; here, most in-context examples are ambiguous w.r.t. their underlying rule, and one critical example disambiguates the task demonstrated. The question is whether conventional IA methods can identify such an example in interpreting the inductive reasoning process in ICL. Our experiments provide several practical findings; for example, a certain simple IA method works the best, and the larger the model, the generally harder it is to interpret the ICL with gradient-based IA methods.
翻译:阐明神经网络模型输出背后的基本原理一直是机器学习领域的一个挑战,这一问题在大型语言模型和上下文学习的时代同样适用。在估计输入归因时,上下文学习提出了一个新的问题:如何解释提示中由一组示例构成的哪些示例对识别待解决的任务/规则起到了贡献作用。为此,本文受归纳推理中刺激贫乏设计的启发,引入了合成诊断任务;在此类任务中,大多数上下文示例就其底层规则而言是模糊的,而一个关键示例则能消除所演示任务的歧义。核心问题在于,传统的输入归因方法是否能在解释上下文学习的归纳推理过程中识别出这样的关键示例。我们的实验提供了若干实践发现:例如,某种简单的输入归因方法效果最佳,且模型越大,基于梯度的输入归因方法通常越难以解释上下文学习过程。