In-context Learning (ICL) is an emerging few-shot learning paradigm based on modern Language Models (LMs), yet its inner mechanism remains unclear. In this paper, we investigate the mechanism through a novel perspective of information removal. Specifically, we demonstrate that in the zero-shot scenario, LMs encode queries into non-selective representations in hidden states containing information for all possible tasks, leading to arbitrary outputs without focusing on the intended task, resulting in near-zero accuracy. Meanwhile, we find that selectively removing specific information from hidden states by a low-rank filter effectively steers LMs toward the intended task. Building on these findings, by measuring the hidden states on carefully designed metrics, we observe that few-shot ICL effectively simulates such task-oriented information removal processes, selectively removing the redundant information from entangled non-selective representations, and improving the output based on the demonstrations, which constitutes a key mechanism underlying ICL. Moreover, we identify essential attention heads inducing the removal operation, termed Denoising Heads, which enables the ablation experiments blocking the information removal operation from the inference, where the ICL accuracy significantly degrades, especially when the correct label is absent from the few-shot demonstrations, confirming both the critical role of the information removal mechanism and denoising heads.
翻译:情境学习(ICL)是一种基于现代语言模型(LM)的新兴少样本学习范式,但其内部机制尚不明确。本文通过信息去除这一新颖视角探究其机制。具体而言,我们证明在零样本场景下,语言模型会将查询编码为隐藏状态中的非选择性表征,该表征包含所有可能任务的信息,导致输出任意且未聚焦于目标任务,从而产生接近零的准确率。同时,我们发现通过低秩滤波器从隐藏状态中有选择地去除特定信息,能有效引导语言模型朝向目标任务。基于这些发现,通过使用精心设计的度量指标对隐藏状态进行测量,我们观察到少样本情境学习有效模拟了此类面向任务的信息去除过程,有选择地从纠缠的非选择性表征中去除冗余信息,并基于示例改进输出,这构成了情境学习的关键底层机制。此外,我们识别出诱导去除操作的关键注意力头,称为去噪头,其支持通过阻断推理过程中的信息去除操作进行消融实验——在该实验中情境学习准确率显著下降,尤其在少样本示例中缺失正确标签的情况下,这同时证实了信息去除机制与去噪头的关键作用。