Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a paradigm similar to the game of telephone, where one person observes a stimulus and reproduces it for the next to form a chain of reproductions. Past serial reproduction experiments typically employ a single sensory modality, but humans often communicate abstractions of the world to each other through language. To investigate the effect language on the formation of abstractions, we implement a novel multimodal serial reproduction framework by asking people who receive a visual stimulus to reproduce it in a linguistic format, and vice versa. We ran unimodal and multimodal chains with both humans and GPT-4 and find that adding language as a modality has a larger effect on human reproductions than GPT-4's. This suggests human visual and linguistic representations are more dissociable than those of GPT-4.
翻译:人类能够从嘈杂的感官数据中提取出有用的世界抽象表征。序列复现方法允许我们通过类似"传话游戏"的范式研究人类如何建构世界认知——即观察者先接收刺激信号,再将其复现传递给下一位,最终形成复现链条。以往的序列复现实验通常采用单一感官模态,但人类常通过语言交流世界的抽象表征。为探究语言对抽象表征形成的影响,我们设计了新型多模态序列复现框架:要求接收视觉刺激的受试者以语言格式进行复现,反之亦然。通过人类与GPT-4在单模态与多模态链条中的实验对比,我们发现语言模态的介入对人类复现结果的影响显著大于GPT-4。这表明人类视觉表征与语言表征的可分离性高于GPT-4。