Large Language Models (LLMs) have demonstrated remarkable abilities in text comprehension and logical reasoning, indicating that the text representations learned by LLMs can facilitate their language processing capabilities. In cognitive science, brain cognitive processing signals are typically utilized to study human language processing. Therefore, it is natural to ask how well the text embeddings from LLMs align with the brain cognitive processing signals, and how training strategies affect the LLM-brain alignment? In this paper, we employ Representational Similarity Analysis (RSA) to measure the alignment between 23 mainstream LLMs and fMRI signals of the brain to evaluate how effectively LLMs simulate cognitive language processing. We empirically investigate the impact of various factors (e.g., pre-training data size, model scaling, alignment training, and prompts) on such LLM-brain alignment. Experimental results indicate that pre-training data size and model scaling are positively correlated with LLM-brain similarity, and alignment training can significantly improve LLM-brain similarity. Explicit prompts contribute to the consistency of LLMs with brain cognitive language processing, while nonsensical noisy prompts may attenuate such alignment. Additionally, the performance of a wide range of LLM evaluations (e.g., MMLU, Chatbot Arena) is highly correlated with the LLM-brain similarity.
翻译:大型语言模型(LLMs)在文本理解和逻辑推理方面展现出卓越能力,这表明LLMs学习到的文本表征能够促进其语言处理能力。在认知科学中,大脑认知处理信号通常被用于研究人类的语言处理过程。因此,一个自然的问题是:LLMs的文本嵌入与大脑认知处理信号的对齐程度如何?训练策略又如何影响LLM-大脑对齐?本文采用表征相似性分析(RSA)来测量23个主流LLMs与大脑功能磁共振成像(fMRI)信号之间的对齐程度,以评估LLMs模拟认知语言处理的有效性。我们通过实验研究了多种因素(如预训练数据规模、模型缩放、对齐训练和提示词)对此类LLM-大脑对齐的影响。实验结果表明,预训练数据规模和模型缩放与LLM-大脑相似性呈正相关,而对齐训练能显著提升LLM-大脑相似性。显式提示词有助于增强LLMs与大脑认知语言处理的一致性,而无意义的噪声提示词则可能削弱这种对齐。此外,广泛的LLM评估表现(如MMLU、Chatbot Arena)与LLM-大脑相似性高度相关。