Large language models (LLMs) exhibit remarkable in-context learning (ICL) capabilities. However, the underlying working mechanism of ICL remains poorly understood. Recent research presents two conflicting views on ICL: One attributes it to LLMs' inherent ability of task recognition, deeming label correctness and shot numbers of demonstrations as not crucial; the other emphasizes the impact of similar examples in the demonstrations, stressing the need for label correctness and more shots. In this work, we provide a Two-Dimensional Coordinate System that unifies both views into a systematic framework. The framework explains the behavior of ICL through two orthogonal variables: whether LLMs can recognize the task and whether similar examples are presented in the demonstrations. We propose the peak inverse rank metric to detect the task recognition ability of LLMs and study LLMs' reactions to different definitions of similarity. Based on these, we conduct extensive experiments to elucidate how ICL functions across each quadrant on multiple representative classification tasks. Finally, we extend our analyses to generation tasks, showing that our coordinate system can also be used to interpret ICL for generation tasks effectively.
翻译:大型语言模型(LLMs)展现出卓越的上下文学习(ICL)能力。然而,ICL的内在工作机制仍不甚明晰。近期研究对ICL提出了两种相互冲突的观点:一种观点将其归因于LLMs固有的任务识别能力,认为演示示例的标签正确性和示例数量并非关键因素;另一种观点则强调演示中相似示例的影响,强调需要正确的标签和更多的示例数量。本研究提出了一个二维坐标系,将这两种观点统一到一个系统框架中。该框架通过两个正交变量解释ICL的行为:LLMs能否识别任务,以及演示中是否呈现相似示例。我们提出峰值逆序数度量来检测LLMs的任务识别能力,并研究LLMs对不同相似性定义的反应。基于此,我们在多个代表性分类任务上进行了大量实验,以阐明ICL在每个象限中的运作方式。最后,我们将分析扩展到生成任务,表明我们的坐标系同样能有效解释生成任务中的ICL机制。