Despite the ubiquity of large language models (LLMs) in AI research, the question of embodiment in LLMs remains underexplored, distinguishing them from embodied systems in robotics where sensory perception directly informs physical action. Our investigation navigates the intriguing terrain of whether LLMs, despite their non-embodied nature, effectively capture implicit human intuitions about fundamental, spatial building blocks of language. We employ insights from spatial cognitive foundations developed through early sensorimotor experiences, guiding our exploration through the reproduction of three psycholinguistic experiments. Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences. Notable distinctions include polarized language model responses and reduced correlations in vision language models. This research contributes to a nuanced understanding of the interplay between language, spatial experiences, and the computations made by large language models. More at https://cisnlp.github.io/Spatial_Schemas/
翻译:尽管大型语言模型(LLM)在人工智能研究中无处不在,但其具身性问题仍未得到充分探索,这使其有别于机器人学中感知直接指导物理行动的具身系统。本研究探讨了一个引人入胜的议题:尽管LLM不具备具身特性,它们是否能有效捕捉人类对语言基本空间构成要素的隐性直觉。我们借鉴了通过早期感觉运动经验发展形成的空间认知基础理论,通过复现三项心理语言学实验来引导探索。令人惊讶的是,模型输出与人类反应之间出现了相关性,揭示了其在不与具身体验建立实质联系情况下的适应性。显著的差异包括语言模型输出的极化现象以及视觉语言模型中相关性的减弱。本研究有助于深化对语言、空间体验与大型语言模型计算之间相互作用的理解。更多信息请访问:https://cisnlp.github.io/Spatial_Schemas/