Deception and persuasion play a critical role in long-horizon dialogues between multiple parties, especially when the interests, goals, and motivations of the participants are not aligned. Such complex tasks pose challenges for current Large Language Models (LLM) as deception and persuasion can easily mislead them, especially in long-horizon multi-party dialogues. To this end, we explore the game of Avalon: The Resistance, a social deduction game in which players must determine each other's hidden identities to complete their team's objective. We introduce an online testbed and a dataset containing 20 carefully collected and labeled games among human players that exhibit long-horizon deception in a cooperative-competitive setting. We discuss the capabilities of LLMs to utilize deceptive long-horizon conversations between six human players to determine each player's goal and motivation. Particularly, we discuss the multimodal integration of the chat between the players and the game's state that grounds the conversation, providing further insights into the true player identities. We find that even current state-of-the-art LLMs do not reach human performance, making our dataset a compelling benchmark to investigate the decision-making and language-processing capabilities of LLMs. Our dataset and online testbed can be found at our project website: https://sstepput.github.io/Avalon-NLU/
翻译:欺骗与说服在多参与方的长程对话中起着关键作用,尤其是在参与者利益、目标和动机不一致的情况下。这类复杂任务对当前的大语言模型(LLM)构成挑战,因为欺骗和说服容易误导模型,特别是在多参与方的长程对话场景中。为此,我们探索了《阿瓦隆:反抗组织》这一社交推理游戏,玩家必须判断彼此隐藏的身份以完成团队目标。我们引入了一个在线测试平台和一个数据集,包含20场精心收集和标注的人类玩家游戏,展现了合作-竞争环境下的长程欺骗。我们讨论了LLM利用六名人类玩家之间的欺骗性长程对话来判断每个玩家目标和动机的能力。特别地,我们探讨了玩家聊天记录与游戏状态(作为对话基础)的多模态整合,从而提供对玩家真实身份的深层洞察。我们发现,即使是最先进的LLM也无法达到人类水平的表现,这使得我们的数据集成为研究LLM决策和语言处理能力的重要基准。我们的数据集和在线测试平台可在项目网站获取:https://sstepput.github.io/Avalon-NLU/