In addition to the ability to generate fluent text in various languages, large language models have been successful at tasks that involve basic forms of logical "reasoning" over their context. Recent work found that selectively removing certain components from weight matrices in pre-trained models can improve such reasoning capabilities. We investigate this phenomenon further by carefully studying how certain global associations tend to be stored in specific weight components or Transformer blocks, in particular feed-forward layers. Such associations may hurt predictions in reasoning tasks, and removing the corresponding components may then improve performance. We analyze how this arises during training, both empirically and theoretically, on a two-layer Transformer trained on a basic reasoning task with noise, a toy associative memory model, and on the Pythia family of pre-trained models tested on simple reasoning tasks.
翻译:除了生成多语言流畅文本的能力外,大语言模型在涉及基于上下文的简单逻辑"推理"任务中也表现出色。近期研究发现,选择性移除预训练模型权重矩阵中的特定组件可提升此类推理能力。我们通过系统研究全局关联信息在特定权重组件或Transformer模块(尤其是前馈层)中的存储机制,进一步探究了这一现象。这类关联可能对推理任务的预测产生干扰,移除相应组件后模型性能得以改善。我们通过实证与理论分析揭示了这一现象在训练过程中的产生机制:研究涵盖含噪声基本推理任务训练的双层Transformer模型、玩具联想记忆模型,以及经简单推理任务测试的Pythia系列预训练模型家族。