Neural Theory-of-Mind (N-ToM), machine's ability to understand and keep track of the mental states of others, is pivotal in developing socially intelligent agents. However, prevalent N-ToM benchmarks have several shortcomings, including the presence of ambiguous and artificial narratives, absence of personality traits and preferences, a lack of questions addressing characters' psychological mental states, and limited diversity in the questions posed. In response to these issues, we construct OpenToM, a new benchmark for assessing N-ToM with (1) longer and clearer narrative stories, (2) characters with explicit personality traits, (3) actions that are triggered by character intentions, and (4) questions designed to challenge LLMs' capabilities of modeling characters' mental states of both the physical and psychological world. Using OpenToM, we reveal that state-of-the-art LLMs thrive at modeling certain aspects of mental states in the physical world but fall short when tracking characters' mental states in the psychological world.
翻译:神经心智理论(N-ToM),即机器理解并追踪他人心理状态的能力,对于开发具有社交智能的智能体至关重要。然而,当前主流的N-ToM基准存在若干缺陷,包括叙事内容模糊且人为造作、缺乏个性特征与偏好、缺少针对角色心理状态的提问,以及问题多样性不足。针对这些问题,我们构建了OpenToM——一种新的N-ToM评估基准,其具有以下特点:(1)更长、更清晰的叙事故事;(2)具有明确个性特征的角色;(3)由角色意图触发的行动;(4)旨在挑战大语言模型对物理世界与心理世界中角色心理状态建模能力的问题。利用OpenToM,我们发现当前最先进的LLMs在建模物理世界中的某些心理状态维度上表现优异,但在追踪心理世界中角色的心理状态时仍存在不足。