Previous work has shown correlations between the hidden states of large language models and fMRI brain responses, on language tasks. These correlations have been taken as evidence of the representational similarity of these models and brain states. This study tests whether these previous results are robust to several possible concerns. Specifically this study shows: (i) that the previous results are still found after dimensionality reduction, and thus are not attributable to the curse of dimensionality; (ii) that previous results are confirmed when using new measures of similarity; (iii) that correlations between brain representations and those from models are specific to models trained on human language; and (iv) that the results are dependent on the presence of positional encoding in the models. These results confirm and strengthen the results of previous research and contribute to the debate on the biological plausibility and interpretability of state-of-the-art large language models.
翻译:先前的研究表明,在语言任务中,大型语言模型的隐藏状态与功能性磁共振成像(fMRI)脑响应之间存在相关性。这些相关性被视为这些模型与大脑状态在表征上具有相似性的证据。本研究检验了先前结果是否对若干可能的疑虑具有稳健性。具体而言,本研究证明:(i)在降维后仍能发现先前的结果,因此不能归因于维度灾难;(ii)使用新的相似性度量方法时,先前结果得到确认;(iii)大脑表征与模型表征之间的相关性特属于那些在人类语言上训练的模型;以及(iv)结果依赖于模型中位置编码的存在。这些结果证实并强化了先前研究的发现,并对关于最先进大型语言模型的生物合理性及可解释性的辩论作出了贡献。