With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain.
翻译:随着近期发展,大型语言模型(LLM)被发现展现出一定水平的心智理论(ToM)能力,这是一种复杂的认知功能,与我们的意识相关,使我们能够推断他人的信念和视角。尽管人类ToM能力被认为源于广泛互联的脑神经网络(包括背内侧前额叶皮层(dmPFC)神经元)的神经活动,但LLM具备ToM能力的具体过程及其与人类相似性的机制仍大多未知。在本研究中,我们借鉴人类ToM中dmPFC神经元的功能,采用类似方法论检验LLM是否表现出可比特征。令人惊讶的是,我们的分析揭示了两者之间的显著相似性:LLM内部的隐藏嵌入(人工神经元)开始对真信念或假信念任务表现出显著响应,表明其具备表征他人视角的能力。这些人工嵌入响应与LLM在ToM任务中的表现紧密相关,且该特性依赖于模型规模。此外,利用整体嵌入可准确解码他人的信念,表明嵌入在群体层面具备ToM能力。综上,我们的发现揭示了LLM嵌入的涌现特性——其活动会根据ToM特征动态调整,为人工模型与人类大脑神经元之间的平行性提供了初步证据。