Large language models (LLMs) are demonstrating remarkable capabilities across various tasks despite lacking a foundation in human cognition. This raises the question: can these models, beyond simply mimicking human language patterns, offer insights into the mechanisms underlying human cognition? This study explores the ability of ChatGPT to predict human performance in a language-based memory task. Building upon theories of text comprehension, we hypothesize that recognizing ambiguous sentences (e.g., "Because Bill drinks wine is never kept in the house") is facilitated by preceding them with contextually relevant information. Participants, both human and ChatGPT, were presented with pairs of sentences. The second sentence was always a garden-path sentence designed to be inherently ambiguous, while the first sentence either provided a fitting (e.g., "Bill has chronic alcoholism") or an unfitting context (e.g., "Bill likes to play golf"). We measured both human's and ChatGPT's ratings of sentence relatedness, ChatGPT's memorability ratings for the garden-path sentences, and humans' spontaneous memory for the garden-path sentences. The results revealed a striking alignment between ChatGPT's assessments and human performance. Sentences deemed more related and assessed as being more memorable by ChatGPT were indeed better remembered by humans, even though ChatGPT's internal mechanisms likely differ significantly from human cognition. This finding, which was confirmed with a robustness check employing synonyms, underscores the potential of generative AI models to predict human performance accurately. We discuss the broader implications of these findings for leveraging LLMs in the development of psychological theories and for gaining a deeper understanding of human cognition.
翻译:大型语言模型(LLMs)虽然缺乏人类认知基础,但在多种任务中展现出卓越能力。这引发了一个问题:这些模型除了模仿人类语言模式外,能否揭示人类认知背后的机制?本研究探索了ChatGPT在基于语言的记忆任务中预测人类表现的能力。基于文本理解理论,我们假设识别歧义句(例如,“因为比尔喝酒从未被放在房子里”)可以通过在其前添加上下文相关信息来促进。人类参与者和ChatGPT均被呈现成对句子。第二句始终是设计为内在歧义的“花园路径句”,而第一句则提供合适(例如,“比尔患有慢性酒精中毒”)或不合适(例如,“比尔喜欢打高尔夫”)的上下文。我们测量了人类和ChatGPT对句子相关性的评分、ChatGPT对花园路径句的记忆性评分,以及人类对花园路径句的自发记忆。结果显示,ChatGPT的评估与人类表现高度一致。ChatGPT判定为更相关且更具记忆性的句子,人类确实记得更好——尽管ChatGPT的内部机制很可能与人类认知存在显著差异。这一发现通过使用同义词的稳健性检验得到确认,突显了生成式AI模型准确预测人类表现的潜力。我们讨论了这些发现对利用LLMs发展心理学理论以及更深入理解人类认知的广泛意义。