Neural Theory-of-Mind (N-ToM), machine's ability to understand and keep track of the mental states of others, is pivotal in developing socially intelligent agents. However, prevalent N-ToM benchmarks have several shortcomings, including the presence of ambiguous and artificial narratives, absence of personality traits and preferences, a lack of questions addressing characters' psychological mental states, and limited diversity in the questions posed. In response to these issues, we construct OpenToM, a new benchmark for assessing N-ToM with (1) longer and clearer narrative stories, (2) characters with explicit personality traits, (3) actions that are triggered by character intentions, and (4) questions designed to challenge LLMs' capabilities of modeling characters' mental states of both the physical and psychological world. Using OpenToM, we reveal that state-of-the-art LLMs thrive at modeling certain aspects of mental states in the physical world but fall short when tracking characters' mental states in the psychological world.
翻译:神经心智理论(N-ToM)——机器理解并追踪他人心理状态的能力——是发展社会智能体的关键。然而,现有的N-ToM基准存在若干缺陷,包括叙事模糊且不自然、缺乏个性特征与偏好、针对角色心理状态的提问缺失、以及问题类型多样性不足。针对这些问题,我们构建了OpenToM这一新型评估基准,其具备以下特点:(1)篇幅更长、表述更清晰的叙事故事,(2)具有明确个性特征的角色,(3)由角色意图驱动的行为,以及(4)旨在挑战大语言模型对物理世界与心理世界中角色心理状态建模能力的问题。基于OpenToM,我们发现当前最先进的LLM在建模物理世界中心理状态的某些层面表现优异,但在追踪心理世界中角色心理状态时仍存在不足。