In-context learning, i.e., learning from in-context samples, is an impressive ability of Transformer. However, the mechanism driving the in-context learning is not yet fully understood. In this study, we aim to investigate from an underexplored perspective of representation learning. The representation is more complex for in-context learning senario, where the representation can be impacted by both model weights and in-context samples. We refer the above two conceptually aspects of representation as in-weight component and in-context component, respectively. To study how the two components affect in-context learning capabilities, we construct a novel synthetic task, making it possible to device two probes, in-weights probe and in-context probe, to evaluate the two components, respectively. We demonstrate that the goodness of in-context component is highly related to the in-context learning performance, which indicates the entanglement between in-context learning and representation learning. Furthermore, we find that a good in-weights component can actually benefit the learning of the in-context component, indicating that in-weights learning should be the foundation of in-context learning. To further understand the the in-context learning mechanism and importance of the in-weights component, we proof by construction that a simple Transformer, which uses pattern matching and copy-past mechanism to perform in-context learning, can match the in-context learning performance with more complex, best tuned Transformer under the perfect in-weights component assumption. In short, those discoveries from representation learning perspective shed light on new approaches to improve the in-context capacity.
翻译:上下文学习,即从上下文样本中学习,是Transformer的一项令人印象深刻的能力。然而,驱动上下文学习的机制尚未被完全理解。在本研究中,我们旨在从表示学习这一尚未充分探索的视角进行探究。在上下文学习场景中,表示更为复杂,其可能同时受到模型权重和上下文样本的影响。我们将上述表示的两个概念性方面分别称为权重内分量和上下文内分量。为了研究这两个分量如何影响上下文学习能力,我们构建了一个新颖的合成任务,从而可以设计两种探针——权重内探针和上下文内探针——分别评估这两个分量。我们证明,上下文内分量的质量与上下文学习性能高度相关,这表明上下文学习与表示学习之间存在纠缠。此外,我们发现良好的权重内分量实际上有助于上下文内分量的学习,这表明权重内学习应是上下文学习的基础。为进一步理解上下文学习机制及权重内分量的重要性,我们通过构造性证明表明:在完美的权重内分量假设下,一个使用模式匹配和复制粘贴机制执行上下文学习的简单Transformer,可以达到更复杂、经过最佳调优的Transformer的上下文学习性能。简而言之,这些从表示学习视角的发现为提升上下文能力提供了新思路。