超越生存：基于人类对齐策略评估LLM在社交推理游戏中的表现 (Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies)

Social deduction games like Werewolf combine language, reasoning, and strategy, providing a testbed for studying natural language and social intelligence. However, most studies reduce the game to LLM-based self-play, yielding templated utterances and anecdotal cases that overlook the richness of social gameplay. Evaluation further relies on coarse metrics such as survival time or subjective scoring due to the lack of quality reference data. To address these gaps, we curate a high-quality, human-verified multimodal Werewolf dataset containing over 100 hours of video, 32.4M utterance tokens, and 15 rule variants. Based on this dataset, we propose a novel strategy-alignment evaluation that leverages the winning faction's strategies as ground truth in two stages: 1) Speech evaluation, formulated as multiple-choice-style tasks that assess whether the model can adopt appropriate stances across five dimensions of social ability; and 2) Decision evaluation, which assesses the model's voting choices and opponent-role inferences. This framework enables a fine-grained evaluation of models' linguistic and reasoning capabilities, while capturing their ability to generate strategically coherent gameplay. Our experiments show that state-of-the-art LLMs show diverse performance, with roughly half remain below 0.50, revealing clear gaps in deception and counterfactual reasoning. We hope our dataset further inspires research on language, reasoning, and strategy in multi-agent interaction.

翻译：《狼人杀》等社交推理游戏融合了语言、推理与策略，为研究自然语言与社交智能提供了测试平台。然而，现有研究大多将游戏简化为基于LLM的自我博弈，导致生成模板化话语与碎片化案例，忽视了社交博弈的丰富性。由于缺乏高质量参考数据，评估进一步依赖生存时间或主观评分等粗粒度指标。为弥补这些不足，我们构建了一个经人工验证的高质量多模态《狼人杀》数据集，包含超过100小时视频、3240万话语标记及15种规则变体。基于此数据集，我们提出一种新颖的策略对齐评估方法，分两阶段利用获胜阵营的策略作为基准真值：1）话语评估，通过多选题形式评估模型能否在社交能力的五个维度上采取恰当立场；2）决策评估，分析模型的投票选择与对手角色推断能力。该框架能对模型的语言与推理能力进行细粒度评估，同时捕捉其生成策略连贯博弈的能力。实验表明，前沿LLM表现差异显著，约半数模型得分低于0.50，在欺骗与反事实推理方面存在明显不足。我们希望本数据集能进一步推动多智能体交互中语言、推理与策略的融合研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日