Theory-of-Mind (ToM), the ability to infer others' perceptions and mental states, is fundamental to human interaction but remains a challenging task for Large Language Models (LLMs). While existing ToM reasoning methods show promise with reasoning via perceptual perspective-taking, they often rely excessively on LLMs, reducing their efficiency and limiting their applicability to high-order ToM reasoning, which requires multi-hop reasoning about characters' beliefs. To address these issues, we present EnigmaToM, a novel neuro-symbolic framework that enhances ToM reasoning by integrating a Neural Knowledge Base of entity states (Enigma) for (1) a psychology-inspired iterative masking mechanism that facilitates accurate perspective-taking and (2) knowledge injection that elicits key entity information. Enigma generates structured representations of entity states, which construct spatial scene graphs -- leveraging spatial information as an inductive bias -- for belief tracking of various ToM orders and enhancing events with fine-grained entity state details. Experimental results on multiple benchmarks, including ToMi, HiToM, and FANToM, show that EnigmaToM significantly improves ToM reasoning across LLMs of varying sizes, particularly excelling in high-order reasoning scenarios.
翻译:心理理论(Theory-of-Mind,ToM)是指推断他人感知与心理状态的能力,对人类交互至关重要,但对大语言模型(LLMs)而言仍是一项具有挑战性的任务。现有的ToM推理方法通过感知视角采撷进行推理,虽展现出潜力,但往往过度依赖LLMs,降低了效率,并限制了其在需要多跳推理角色信念的高阶ToM推理中的适用性。为解决这些问题,我们提出了EnigmaToM,一种新颖的神经符号框架。该框架通过集成一个用于实体状态的神经知识库(Enigma)来增强ToM推理,该知识库支持(1)一种受心理学启发的迭代掩蔽机制,以促进准确的视角采撷,以及(2)知识注入,以引出关键的实体信息。Enigma生成实体状态的结构化表示,这些表示构建了空间场景图——利用空间信息作为归纳偏置——用于追踪不同阶数的信念,并以细粒度的实体状态细节增强事件描述。在多个基准测试(包括ToMi、HiToM和FANToM)上的实验结果表明,EnigmaToM显著提升了不同规模LLMs的ToM推理能力,尤其在高阶推理场景中表现卓越。