A fundamental question in neurolinguistics concerns the brain regions involved in syntactic and semantic processing during speech comprehension, both at the lexical (word processing) and supra-lexical levels (sentence and discourse processing). To what extent are these regions separated or intertwined? To address this question, we trained a lexical language model, Glove, and a supra-lexical language model, GPT-2, on a text corpus from which we selectively removed either syntactic or semantic information. We then assessed to what extent these information-restricted models were able to predict the time-courses of fMRI signal of humans listening to naturalistic text. We also manipulated the size of contextual information provided to GPT-2 in order to determine the windows of integration of brain regions involved in supra-lexical processing. Our analyses show that, while most brain regions involved in language are sensitive to both syntactic and semantic variables, the relative magnitudes of these effects vary a lot across these regions. Furthermore, we found an asymmetry between the left and right hemispheres, with semantic and syntactic processing being more dissociated in the left hemisphere than in the right, and the left and right hemispheres showing respectively greater sensitivity to short and long contexts. The use of information-restricted NLP models thus shed new light on the spatial organization of syntactic processing, semantic processing and compositionality.
翻译:神经语言学的一个基本问题是语言理解过程中参与句法处理和语义处理的大脑区域,这涉及词汇层面(单词处理)和超词汇层面(句子和语篇处理)。这些区域在多大程度上是分离还是交织的?为解决此问题,我们训练了词汇语言模型Glove和超词汇语言模型GPT-2,使用的文本语料库中分别选择性移除了句法或语义信息。随后,我们评估了这些信息受限模型在预测人类聆听自然文本时fMRI信号时间序列的能力。同时,我们操控了提供给GPT-2的上文信息长度,以确定参与超词汇处理的大脑区域的整合窗口。分析表明,尽管大多数参与语言处理的大脑区域对句法和语义变量均敏感,但这些效应在这些区域的相对幅度差异显著。此外,我们发现了左右半球的不对称性:左半球的语义与句法处理比右半球更分离,而左、右半球分别对短上下文和长上下文更为敏感。因此,使用信息受限的NLP模型为句法处理、语义处理及组合性的空间组织提供了新见解。