The recent surge of large language models (LLMs) highlights their ability to perform in-context learning, i.e., "learning" to perform a task from a few demonstrations in the context without any parameter updates. However, their capabilities of in-context learning are limited by the model architecture: 1) the use of demonstrations is constrained by a maximum sentence length due to positional embeddings; 2) the quadratic complexity of attention hinders users from using more demonstrations efficiently; 3) LLMs are shown to be sensitive to the order of the demonstrations. In this work, we tackle these challenges by proposing a better architectural design for in-context learning. We propose SAICL (Structured Attention for In-Context Learning), which replaces the full-attention by a structured attention mechanism designed for in-context learning, and removes unnecessary dependencies between individual demonstrations, while making the model invariant to the permutation of demonstrations. We evaluate SAICL in a meta-training framework and show that SAICL achieves comparable or better performance than full attention while obtaining up to 3.4x inference speed-up. SAICL also consistently outperforms a strong Fusion-in-Decoder (FiD) baseline which processes each demonstration independently. Finally, thanks to its linear nature, we demonstrate that SAICL can easily scale to hundreds of demonstrations with continuous performance gains with scaling.
翻译:近期大规模语言模型(LLMs)的兴起凸显了其进行上下文学习的能力,即无需更新参数,仅通过上下文中的少量演示即可“学习”执行任务。然而,其上下文学习能力受到模型架构的限制:1)由于位置编码,演示的使用受限于最大句子长度;2)注意力的二次复杂度阻碍用户高效使用更多演示;3)研究表明LLMs对演示顺序敏感。针对这些挑战,本文通过提出一种更优的上下文学习架构设计予以解决。我们提出SAICL(上下文学习的结构化注意力),其用专为上下文学习设计的结构化注意力机制替代全注意力,消除了单个演示间不必要的依赖关系,同时使模型对演示排列具有不变性。我们在元训练框架下评估SAICL,结果表明:SAICL在实现与全注意力相当或更优性能的同时,推理速度提升高达3.4倍;SAICL还持续优于独立处理每个演示的强基线Fusion-in-Decoder(FiD)。最后,得益于其线性特性,我们证明SAICL可轻松扩展至数百个演示,且性能随规模扩大持续提升。