To what extent can language alone give rise to complex concepts, or is embodied experience essential? Recent advancements in large language models (LLMs) offer fresh perspectives on this question. Although LLMs are trained on restricted modalities, they exhibit human-like performance in diverse psychological tasks. Our study compared representations of 4,442 lexical concepts between humans and ChatGPTs (GPT-3.5 and GPT-4) across multiple dimensions, including five key domains: emotion, salience, mental visualization, sensory, and motor experience. We identify two main findings: 1) Both models strongly align with human representations in non-sensorimotor domains but lag in sensory and motor areas, with GPT-4 outperforming GPT-3.5; 2) GPT-4's gains are associated with its additional visual learning, which also appears to benefit related dimensions like haptics and imageability. These results highlight the limitations of language in isolation, and that the integration of diverse modalities of inputs leads to a more human-like conceptual representation.
翻译:语言本身在多大程度上能产生复杂概念?具身经验是否不可或缺?近期大型语言模型的进展为这一问题提供了新视角。尽管LLMs的训练模态受限,但它们在多种心理任务中展现出类人表现。本研究比较了人类与ChatGPT(GPT-3.5和GPT-4)对4,442个词汇概念在多个维度的表征,涵盖五大关键领域:情绪、显著度、心理可视化、感知与运动体验。我们得出两个主要发现:1)两模型在非感知运动领域与人类表征高度一致,但在感知与运动领域表现较弱,其中GPT-4优于GPT-3.5;2)GPT-4的性能提升与其增加的视觉学习相关,这种学习似乎也促进了触觉和可意象性等相关维度的发展。这些结果凸显了孤立语言的局限性,表明整合多种输入模态能产生更类人的概念表征。