Understanding the similarity between large language models (LLMs) and human brain activity is crucial for advancing both AI and cognitive neuroscience. In this study, we provide a multilinguistic, large-scale assessment of this similarity by systematically comparing 16 publicly available pretrained LLMs with human brain responses during natural language processing tasks in both English and Chinese. Specifically, we use ridge regression to assess the representational similarity between LLM embeddings and electroencephalography (EEG) signals, and analyze the similarity between the "neural trajectory" and the "LLM latent trajectory." This method captures key dynamic patterns, such as magnitude, angle, uncertainty, and confidence. Our findings highlight both similarities and crucial differences in processing strategies: (1) We show that middle-to-high layers of LLMs are central to semantic integration and correspond to the N400 component observed in EEG; (2) The brain exhibits continuous and iterative processing during reading, whereas LLMs often show discrete, stage-end bursts of activity, which suggests a stark contrast in their real-time semantic processing dynamics. This study could offer new insights into LLMs and neural processing, and also establish a critical framework for future investigations into the alignment between artificial intelligence and biological intelligence.
翻译:理解大型语言模型(LLM)与人类大脑活动之间的相似性,对于推动人工智能和认知神经科学的进步至关重要。本研究通过系统比较16个公开可用的预训练LLM与人类在英语和汉语自然语言处理任务中的大脑响应,对这一相似性进行了多语言、大规模评估。具体而言,我们使用岭回归评估LLM嵌入与脑电图(EEG)信号之间的表征相似性,并分析"神经轨迹"与"LLM潜在轨迹"之间的相似性。该方法捕捉了关键的动态模式,例如幅度、角度、不确定性和置信度。我们的研究结果突显了处理策略上的相似性与关键差异:(1)我们发现LLM的中高层是语义整合的核心,并与EEG中观察到的N400成分相对应;(2)大脑在阅读过程中表现出连续且迭代的处理,而LLM则常显示出离散的、阶段末端的活动爆发,这表明它们在实时语义处理动态上存在鲜明对比。这项研究可能为理解LLM与神经处理提供新的见解,并为未来探索人工智能与生物智能之间的对齐建立一个关键框架。