The Werewolf game is a social deduction game based on free natural language communication, in which players try to deceive others in order to survive. An important feature of this game is that a large portion of the conversations are false information, and the behavior of artificial intelligence (AI) in such a situation has not been widely investigated. The purpose of this study is to develop an AI agent that can play Werewolf through natural language conversations. First, we collected game logs from 15 human players. Next, we fine-tuned a Transformer-based pretrained language model to construct a value network that can predict a posterior probability of winning a game at any given phase of the game and given a candidate for the next action. We then developed an AI agent that can interact with humans and choose the best voting target on the basis of its probability from the value network. Lastly, we evaluated the performance of the agent by having it actually play the game with human players. We found that our AI agent, Deep Wolf, could play Werewolf as competitively as average human players in a villager or a betrayer role, whereas Deep Wolf was inferior to human players in a werewolf or a seer role. These results suggest that current language models have the capability to suspect what others are saying, tell a lie, or detect lies in conversations.
翻译:狼人杀游戏是一种基于自由自然语言交流的社交推理游戏,玩家在其中试图欺骗他人以求生存。该游戏的一个重要特征是大部分对话包含虚假信息,而人工智能(AI)在这种情况下行为尚未得到广泛研究。本研究的目的是开发一个能够通过自然语言对话进行狼人杀游戏的AI智能体。首先,我们收集了15名人类玩家的游戏日志。其次,我们微调了一个基于Transformer的预训练语言模型,构建了一个价值网络,该网络能够预测在游戏任意阶段和给定下一步行动候选条件下赢得游戏的后验概率。然后,我们开发了一个能与人类交互并根据价值网络输出的概率选择最佳投票目标的AI智能体。最后,我们通过让该智能体与人类玩家实际进行游戏来评估其性能。我们发现,我们的AI智能体Deep Wolf在扮演村民或叛徒角色时,能达到与普通人类玩家相当的竞争力,但在扮演狼人或预言家角色时略逊于人类玩家。这些结果表明,当前的语言模型具备怀疑他人言论、说谎或在对话中识别谎言的能力。