In-Context Learning (ICL) empowers Large Language Models (LLMs) with the capacity to learn in context, achieving downstream generalization without gradient updates but with a few in-context examples. Despite the encouraging empirical success, the underlying mechanism of ICL remains unclear, and existing research offers various viewpoints of understanding. These studies propose intuition-driven and ad-hoc technical solutions for interpreting ICL, illustrating an ambiguous road map. In this paper, we leverage a data generation perspective to reinterpret recent efforts and demonstrate the potential broader usage of popular technical solutions, approaching a systematic angle. For a conceptual definition, we rigorously adopt the terms of skill learning and skill recognition. The difference between them is skill learning can learn new data generation functions from in-context data. We also provide a comprehensive study on the merits and weaknesses of different solutions, and highlight the uniformity among them given the perspective of data generation, establishing a technical foundation for future research to incorporate the strengths of different lines of research.
翻译:上下文学习(ICL)赋予大型语言模型(LLM)在上下文中学习的能力,仅需少量上下文示例即可实现下游泛化,而无需梯度更新。尽管经验上取得了令人鼓舞的成功,但ICL的内在机制仍不明确,现有研究提出了多种理解视角。这些研究提出了直觉驱动和临时性的技术方案来解释ICL,描绘了一条模糊的技术路线图。本文从数据生成的视角重新阐释近期研究,并论证流行技术方案更广泛的潜在应用,从而趋近于一个系统性的分析角度。在概念定义上,我们严格采用技能学习与技能识别的术语。二者的区别在于,技能学习能够从上下文数据中学习新的数据生成函数。我们还全面研究了不同方案的优缺点,并在数据生成的视角下强调了它们之间的统一性,为未来研究整合不同技术路线的优势奠定了技术基础。