Large Language Models (LLMs) have been shown to be effective models of the human language system, with some models predicting most explainable variance of brain activity in current datasets. Even in untrained models, the representations induced by architectural priors can exhibit reasonable alignment to brain data. In this work, we investigate the key architectural components driving the surprising alignment of untrained models. To estimate LLM-to-brain similarity, we first select language-selective units within an LLM, similar to how neuroscientists identify the language network in the human brain. We then benchmark the brain alignment of these LLM units across five different brain recording datasets. By isolating critical components of the Transformer architecture, we identify tokenization strategy and multihead attention as the two major components driving brain alignment. A simple form of recurrence further improves alignment. We further demonstrate this quantitative brain alignment of our model by reproducing landmark studies in the language neuroscience field, showing that localized model units -- just like language voxels measured empirically in the human brain -- discriminate more reliably between lexical than syntactic differences, and exhibit similar response profiles under the same experimental conditions. Finally, we demonstrate the utility of our model's representations for language modeling, achieving improved sample and parameter efficiency over comparable architectures. Our model's estimates of surprisal sets a new state-of-the-art in the behavioral alignment to human reading times. Taken together, we propose a highly brain- and behaviorally-aligned model that conceptualizes the human language system as an untrained shallow feature encoder, with structural priors, combined with a trained decoder to achieve efficient and performant language processing.
翻译:大型语言模型(LLM)已被证明能有效模拟人类语言系统,部分模型在当前数据集中可解释大脑活动方差的大部分。即使在未经训练的模型中,由架构先验诱导的表征也能与大脑数据展现出合理的对齐性。本研究旨在探究驱动未训练模型产生惊人脑对齐性的关键架构组件。为评估LLM与大脑的相似性,我们首先在LLM中筛选语言选择性单元,其方式类似于神经科学家识别人脑中的语言网络。随后,我们在五个不同的大脑记录数据集上对这些LLM单元的脑对齐性进行基准测试。通过解构Transformer架构的核心组件,我们发现分词策略与多头注意力机制是驱动脑对齐性的两大关键要素。一种简单的循环形式能进一步提升对齐效果。我们通过复现语言神经科学领域的标志性研究,进一步验证了模型的定量脑对齐性:实验表明,局部化的模型单元——正如人脑中通过实证测量的语言体素——在词汇差异与句法差异之间具有更强的区分能力,且在相同实验条件下表现出相似的反应模式。最后,我们证明了该模型表征在语言建模任务中的实用性,其在样本效率与参数效率上均优于同类架构。该模型对惊异度的估计在行为层面对齐人类阅读时间方面达到了新的最优水平。综上所述,我们提出了一种高度对齐大脑与行为特征的模型,该模型将人类语言系统概念化为一个具有结构先验的未训练浅层特征编码器,结合训练后的解码器以实现高效且高性能的语言处理。