Theory of Mind (ToM), the ability to understand people's minds based on their behavior, is key to developing socially intelligent agents. Current approaches to ToM reasoning either rely on prompting Large Language Models (LLMs), which are prone to systematic errors, or use handcrafted, rigid agent models for model-based inference, which are more robust but fail to generalize across domains. In this work, we introduce AutoToM, an automated agent modeling method for scalable, robust, and interpretable mental inference. Given a ToM problem, AutoToM first proposes an initial agent model and then performs automated Bayesian inverse planning based on this model, leveraging an LLM backend. Guided by inference uncertainty, it iteratively refines the model by introducing additional mental variables and/or incorporating more timesteps in the context. Across five diverse benchmarks, AutoToM outperforms existing ToM methods and even large reasoning models. Additionally, we show that AutoToM can produce human-like confidence estimates and enable online mental inference for embodied decision-making.
翻译:心理理论(Theory of Mind,ToM)——即通过观察行为理解他人心理状态的能力——是发展社会智能体的关键。当前的心理理论推理方法要么依赖于提示大型语言模型(LLMs),这类方法容易产生系统性错误;要么采用手工构建的、僵化的智能体模型进行基于模型的推断,这类方法虽然更稳健但难以跨领域泛化。本研究提出了AutoToM,一种用于实现可扩展、稳健且可解释的心理推断的自动化智能体建模方法。给定一个心理理论问题,AutoToM首先提出一个初始智能体模型,然后基于该模型并利用大型语言模型后端进行自动化贝叶斯逆向规划。在推断不确定性的指导下,该方法通过引入额外的心理变量和/或在上下文中纳入更多时间步长来迭代优化模型。在五个多样化的基准测试中,AutoToM的表现超越了现有的心理理论方法,甚至优于大型推理模型。此外,我们展示了AutoToM能够生成类人的置信度估计,并能为具身决策实现在线心理推断。