In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models. By examining commonly employed synthetic ICL tasks, we demonstrate that multi-layer perceptrons (MLPs) can also learn in-context. Moreover, MLPs, and the closely related MLP-Mixer models, learn in-context competitively with Transformers given the same compute budget in this setting. We further show that MLPs outperform Transformers on a series of classical tasks from psychology designed to test relational reasoning, which are closely related to in-context classification. These results underscore a need for studying in-context learning beyond attention-based architectures, while also challenging strong prior arguments about MLPs' limited ability to solve relational tasks. Altogether, our results highlight the unexpected competence of MLPs, and support the growing interest in all-MLP alternatives to task-specific architectures.
翻译:上下文学习(ICL)这种仅通过输入示例即可解决任务的卓越能力,通常被认为是Transformer模型的独有特征。通过对常用的合成ICL任务进行研究,我们证明多层感知机(MLP)同样能够进行上下文学习。此外,在此设定下,当计算预算相同时,MLP及其密切相关的MLP-Mixer模型在上下文学习方面与Transformer表现出相当的竞争力。我们进一步发现,在一系列旨在测试关系推理的心理学经典任务(这些任务与上下文分类密切相关)上,MLP的表现优于Transformer。这些结果强调有必要在注意力架构之外研究上下文学习机制,同时也对先前关于MLP解决关系任务能力有限的有力论点提出了挑战。总体而言,我们的研究结果揭示了MLP出人意料的强大能力,并为针对特定任务架构的全MLP替代方案日益增长的研究兴趣提供了支持。