This paper introduces Werewolf Arena, a novel framework for evaluating large language models (LLMs) through the lens of the classic social deduction game, Werewolf. In Werewolf Arena, LLMs compete against each other, navigating the game's complex dynamics of deception, deduction, and persuasion. The framework introduces a dynamic turn-taking system based on bidding, mirroring real-world discussions where individuals strategically choose when to speak. We demonstrate the framework's utility through an arena-style tournament featuring Gemini and GPT models. Our results reveal distinct strengths and weaknesses in the models' strategic reasoning and communication. These findings highlight Werewolf Arena's potential as a challenging and scalable LLM benchmark.
翻译:本文提出狼人杀竞技场,一种通过经典社交推理游戏狼人杀来评估大语言模型的新框架。在该框架中,大语言模型相互竞争,应对游戏中欺骗、推理与说服的复杂动态。框架引入基于竞价的动态回合制系统,模拟现实讨论中个体策略性选择发言时机的场景。我们通过举办包含Gemini与GPT模型的竞技场式锦标赛,展示了该框架的实用性。结果显示模型在策略推理与沟通能力上存在明显的优势与不足。这些发现凸显了狼人杀竞技场作为一个具有挑战性且可扩展的大语言模型基准的潜力。