Transformers are effective at inferring the latent task from context via two inference modes: recognizing a task seen during training, and adapting to a novel one. Recent interpretability studies have identified from middle-layer representations task-specific directions, or task vectors, that steer model behavior. However, a lack of rigorous foundations hinders connecting internal representations to external model behavior: existing work fails to explain how task-vector geometry is shaped by the training distribution, and what geometry enables out-of-distribution (OOD) generalization. In this paper, we study these questions in a controlled synthetic setting by training small transformers from scratch on latent-task sequence distributions, which allows a principled mathematical characterization. We show that two inference modes can coexist within a single model. In-distribution behavior is governed by Bayesian task retrieval, implemented internally through convex combinations of learned task vectors. OOD behavior, by contrast, arises through extrapolative task learning, whose representations occupy a subspace nearly orthogonal to the task-vector subspace. Taken together, our results suggest that task-vector geometry, training distributions, and generalization behaviors are closely related.
翻译:Transformer能够通过两种推理模式从上下文中推断潜在任务:识别训练中见过的任务,以及适应全新任务。近期可解释性研究从中层表征中识别出任务特定方向(即任务向量),这些向量操控模型行为。然而,由于缺乏严格的理论基础,内部表征与外部模型行为之间的关联仍不清晰:现有工作未能解释训练分布如何塑造任务向量几何,以及何种几何结构支持分布外泛化。本文在受控合成场景下,通过从头训练小型Transformer处理隐式任务序列分布来研究这些问题,从而允许进行原理性的数学刻画。我们证明两种推理模式可共存于单一模型中。分布内行为由贝叶斯任务检索主导,其内部实现通过已习得任务向量的凸组合完成。相比之下,分布外行为通过外推式任务学习产生,其表征占据近乎正交于任务向量子空间的子空间。综合来看,我们的结果表明任务向量几何、训练分布与泛化行为之间存在紧密关联。