We explore the intriguing possibility that theory of mind (ToM), or the uniquely human ability to impute unobservable mental states to others, might have spontaneously emerged in large language models (LLMs). We designed 40 false-belief tasks, considered a gold standard in testing ToM in humans, and administered them to several LLMs. Each task included a false-belief scenario, three closely matched true-belief controls, and the reversed versions of all four. Smaller and older models solved no tasks; GPT-3-davinci-001 (from May 2020) and GPT-3-davinci-002 (from January 2022) solved 10%; and GPT-3-davinci-003 (from November 2022) and ChatGPT-3.5-turbo (from March 2023) solved 35% of the tasks, mirroring the performance of three-year-old children. ChatGPT-4 (from June 2023) solved 90% of the tasks, matching the performance of seven-year-old children. These findings suggest the intriguing possibility that ToM, previously considered exclusive to humans, may have spontaneously emerged as a byproduct of LLMs' improving language skills.
翻译:我们探讨了一个引人入胜的可能性:心理理论(Theory of Mind, ToM)——即人类将不可观测的心理状态归因于他人的独特能力——或许已在大语言模型(LLMs)中自发涌现。我们设计了40项错误信念任务(这是测试人类心理理论的黄金标准),并对多个大语言模型进行了测试。每项任务包含一个错误信念场景、三个紧密匹配的真实信念对照组,以及所有四个场景的反向版本。规模较小和较旧的模型未能解决任何任务;GPT-3-davinci-001(2020年5月版)和GPT-3-davinci-002(2022年1月版)解决了10%的任务;GPT-3-davinci-003(2022年11月版)和ChatGPT-3.5-turbo(2023年3月版)解决了35%的任务,其表现与三岁儿童相当。ChatGPT-4(2023年6月版)解决了90%的任务,达到了七岁儿童的水平。这些发现暗示了一种引人入胜的可能性:此前被认为是人类独有的心理理论,或许已作为大语言模型语言能力提升的副产品而自发涌现。