Unveiling Theory of Mind in Large Language Models: A Parallel to Single Neurons in the Human Brain

With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surprisingly, our analysis revealed a striking resemblance between the two, as hidden embeddings (artificial neurons) within LLMs started to exhibit significant responsiveness to either true- or false-belief trials, suggesting their ability to represent another's perspective. These artificial embedding responses were closely correlated with the LLMs' performance during the ToM tasks, a property that was dependent on the size of the models. Further, the other's beliefs could be accurately decoded using the entire embeddings, indicating the presence of the embeddings' ToM capability at the population level. Together, our findings revealed an emergent property of LLMs' embeddings that modified their activities in response to ToM features, offering initial evidence of a parallel between the artificial model and neurons in the human brain.

翻译：随着近期发展，大型语言模型（LLM）被发现展现出一定水平的心智理论（ToM）能力，这是一种复杂的认知功能，与我们的意识相关，使我们能够推断他人的信念和视角。尽管人类ToM能力被认为源于广泛互联的脑神经网络（包括背内侧前额叶皮层（dmPFC）神经元）的神经活动，但LLM具备ToM能力的具体过程及其与人类相似性的机制仍大多未知。在本研究中，我们借鉴人类ToM中dmPFC神经元的功能，采用类似方法论检验LLM是否表现出可比特征。令人惊讶的是，我们的分析揭示了两者之间的显著相似性：LLM内部的隐藏嵌入（人工神经元）开始对真信念或假信念任务表现出显著响应，表明其具备表征他人视角的能力。这些人工嵌入响应与LLM在ToM任务中的表现紧密相关，且该特性依赖于模型规模。此外，利用整体嵌入可准确解码他人的信念，表明嵌入在群体层面具备ToM能力。综上，我们的发现揭示了LLM嵌入的涌现特性——其活动会根据ToM特征动态调整，为人工模型与人类大脑神经元之间的平行性提供了初步证据。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日