In-context learning (ICL) is a type of prompting where a transformer model operates on a sequence of (input, output) examples and performs inference on-the-fly. In this work, we formalize in-context learning as an algorithm learning problem where a transformer model implicitly constructs a hypothesis function at inference-time. We first explore the statistical aspects of this abstraction through the lens of multitask learning: We obtain generalization bounds for ICL when the input prompt is (1) a sequence of i.i.d. (input, label) pairs or (2) a trajectory arising from a dynamical system. The crux of our analysis is relating the excess risk to the stability of the algorithm implemented by the transformer. We characterize when transformer/attention architecture provably obeys the stability condition and also provide empirical verification. For generalization on unseen tasks, we identify an inductive bias phenomenon in which the transfer learning risk is governed by the task complexity and the number of MTL tasks in a highly predictable manner. Finally, we provide numerical evaluations that (1) demonstrate transformers can indeed implement near-optimal algorithms on classical regression problems with i.i.d. and dynamic data, (2) provide insights on stability, and (3) verify our theoretical predictions.
翻译:上下文学习(ICL)是一种提示方式,其中Transformer模型通过处理一系列(输入,输出)示例并实时执行推理。本文中,我们形式化地将上下文学习定义为算法学习问题,其中Transformer模型在推理阶段隐式构建假设函数。首先通过多任务学习的视角探讨该抽象模型的统计特性:当输入提示为(1)独立同分布(i.i.d.)的(输入,标签)序列或(2)动力系统产生的轨迹时,我们推导了ICL的泛化界。我们分析的核心是将超额风险与Transformer所实现算法的稳定性相关联。我们刻画了Transformer/注意力架构何时可证明地满足稳定性条件,并提供实证验证。针对未见任务的泛化,我们识别出归纳偏置现象:迁移学习风险以高度可预测的方式受任务复杂度和多任务学习任务数量的影响。最后,我们通过数值评估(1)证明Transformer在经典回归问题中(含独立同分布和动态数据)确实能实现近最优算法,(2)提供关于稳定性的洞见,(3)验证我们的理论预测。