Pre-trained computational language models have recently made remarkable progress in harnessing the language abilities which were considered unique to humans. Their success has raised interest in whether these models represent and process language like humans. To answer this question, this paper proposes MulCogBench, a multi-modal cognitive benchmark dataset collected from native Chinese and English participants. It encompasses a variety of cognitive data, including subjective semantic ratings, eye-tracking, functional magnetic resonance imaging (fMRI), and magnetoencephalography (MEG). To assess the relationship between language models and cognitive data, we conducted a similarity-encoding analysis which decodes cognitive data based on its pattern similarity with textual embeddings. Results show that language models share significant similarities with human cognitive data and the similarity patterns are modulated by the data modality and stimuli complexity. Specifically, context-aware models outperform context-independent models as language stimulus complexity increases. The shallow layers of context-aware models are better aligned with the high-temporal-resolution MEG signals whereas the deeper layers show more similarity with the high-spatial-resolution fMRI. These results indicate that language models have a delicate relationship with brain language representations. Moreover, the results between Chinese and English are highly consistent, suggesting the generalizability of these findings across languages.
翻译:预训练计算语言模型近期在掌握曾被认为人类独有的语言能力方面取得了显著进展。其成功引发了关于这些模型是否像人类一样表征和处理语言的兴趣。为回答这一问题,本文提出了MulCogBench——一个从中文和英文母语者中收集的多模态认知基准数据集。该数据集涵盖多种认知数据,包括主观语义评分、眼动追踪、功能性磁共振成像(fMRI)和脑磁图(MEG)。为了评估语言模型与认知数据之间的关系,我们进行了基于相似性编码的分析,该分析通过文本嵌入与认知数据的模式相似性对其进行解码。结果表明,语言模型与人类认知数据存在显著相似性,且相似性模式受数据模态和刺激复杂度的调节。具体而言,随着语言刺激复杂度的增加,上下文感知模型优于上下文无关模型。上下文感知模型的浅层与高时间分辨率的MEG信号更匹配,而深层则与高空间分辨率的fMRI更相似。这些结果表明语言模型与大脑语言表征之间存在微妙关系。此外,中文和英文的结果高度一致,揭示了这些发现跨语言的普适性。