Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers.
翻译:Transformer已成为深度学习中的主导模型,但其卓越性能的原因尚不明确。本文假设Transformer的强大性能源于对元优化(mesa-optimization)的架构偏好——这是一种在模型前向传播中运行的学习过程,包含以下两个步骤:(i)构建内部学习目标,以及(ii)通过优化找到对应解。为验证这一假设,我们对一系列在简单序列建模任务上训练的自回归Transformer进行逆向工程,揭示了驱动预测生成的底层基于梯度的元优化算法。此外,我们发现学习到的前向传播优化算法可立即用于解决监督式少样本任务,这表明元优化可能是大型语言模型上下文学习能力的基础。最后,我们提出一种新型自注意力层——元层(mesa-layer),该层能够明确且高效地解决上下文中指定的优化问题。实验表明,该层可提升合成任务与初步语言建模实验的性能,进一步支持了我们的假设:元优化是训练完成的Transformer权重中隐藏的重要操作。