Theory of Mind (ToM) capabilities in LLMs have recently become a central object of investigation. Cognitive science distinguishes between two steps required for ToM tasks: 1) determine whether to invoke ToM, which includes the appropriate Depth of Mentalizing (DoM), or level of recursion required to complete a task; and 2) applying the correct inference given the DoM. In this position paper, we first identify several lines of work in different communities in AI, including LLM benchmarking, ToM add-ons, ToM probing, and formal models for ToM. We argue that recent work in AI tends to focus exclusively on the second step which are typically framed as static logic problems. We conclude with suggestions for improved evaluation of ToM capabilities inspired by dynamic environments used in cognitive tasks.
翻译:大型语言模型的心智理论能力近来已成为研究的核心议题。认知科学区分了心智理论任务所需的两个步骤:1)决定是否调用心智理论,这包括确定完成任务所需的心智化深度或递归层级;2)在给定心智化深度的前提下进行正确推理。在本立场论文中,我们首先梳理了人工智能不同领域的研究脉络,包括大型语言模型基准测试、心智理论扩展模块、心智理论探测以及心智理论的形式化模型。我们认为当前人工智能研究往往仅聚焦于第二步——这些研究通常被构建为静态逻辑问题。最后,我们借鉴认知任务中动态环境的设计思路,提出了改进心智理论能力评估的建议。