Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature, such as logical reasoning and social inference. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We here ask whether similar specialization for language emerges in LLMs. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience. We then establish the causal role of these units by demonstrating that ablating LLM language-selective units -- but not random units -- leads to drastic deficits in language tasks. Correspondingly, language-selective LLM units are more aligned to brain recordings from the human language system than random units. Finally, we investigate whether our localization method extends to other cognitive domains: while we find specialized networks in some LLMs for reasoning and social capabilities, there are substantial differences among models. These findings provide functional and causal evidence for specialization in large language models, and highlight parallels with the functional organization in the brain.
翻译:大型语言模型(LLM)不仅在语言任务上表现出卓越能力,在逻辑推理和社会推断等非语言性质的各种任务上也同样出色。在人类大脑中,神经科学已识别出一个选择性且因果性地支持语言处理的核心语言系统。本文探讨LLM中是否会出现类似的语言专门化现象。我们采用神经科学中使用的相同定位方法,在18个主流LLM中识别出语言选择性单元。随后通过实验证明:仅消融LLM的语言选择性单元(而非随机单元)会导致语言任务性能急剧下降,从而确立了这些单元的因果作用。相应地,与随机单元相比,LLM的语言选择性单元与人类语言系统的脑记录信号具有更高的一致性。最后,我们探究了该定位方法是否可扩展至其他认知领域:虽然在某些LLM中发现了专门负责推理和社会能力的网络,但不同模型间存在显著差异。这些发现为大型语言模型中的功能专门化提供了功能性及因果性证据,并揭示了其与大脑功能组织结构的相似性。