Existing dynamic Theory of Mind (ToM) benchmarks mostly place language models in a passive role: the model reads a sequence of connected scenarios and reports what people believe, feel, intend, and do as these states change. In real social interaction, ToM is also used for action: a speaker plans what to say in order to shift another person's mental-state trajectory toward a goal. We introduce SocialMindChange, a benchmark that moves from tracking minds to changing minds in social interaction. Each instance defines a social context with 4 characters and five connected scenes. The model plays one character and generates dialogue across the five scenes to reach the target while remaining consistent with the evolving states of all participants. SocialMindChange also includes selected higher-order states. Using a structured four-step framework, we construct 1,200 social contexts, covering 6000 scenarios and over 90,000 questions, each validated for realism and quality. Evaluations on ten state-of-the-art LLMs show that their average performance is 54.2% below human performance. This gap suggests that current LLMs still struggle to maintain and change mental-state representations across long, linked interactions.
翻译:现有的动态心理理论基准大多将语言模型置于被动角色:模型读取一系列关联情境,并报告人们随着状态变化而产生的信念、感受、意图和行为。在实际社会互动中,心理理论同样被用于行动:说话者通过规划话语内容,以引导他人心理状态轨迹向目标方向转变。本文提出SocialMindChange基准,该基准将研究重点从心理状态追踪转向社会互动中的心理状态改变。每个实例定义包含4个角色和五个关联场景的社会情境。模型扮演其中一个角色,通过五个场景的对话生成来实现目标,同时保持与所有参与者动态演变状态的一致性。SocialMindChange还包含精选的高阶心理状态表征。通过结构化四步构建框架,我们创建了1,200个社会情境,涵盖6000个场景和超过90,000个问题,每个实例均通过真实性与质量验证。对十个前沿大语言模型的评估显示,其平均性能较人类表现低54.2%。这一差距表明当前大语言模型在长程关联互动中维持和改变心理状态表征方面仍面临显著挑战。