Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created and was not a true gold standard game state. We present FIREBALL, a large dataset containing nearly 25,000 unique sessions from real D\&D gameplay on Discord with true game state info. We recorded game play sessions of players who used the Avrae bot, which was developed to aid people in playing D&D online, capturing language, game commands and underlying game state information. We demonstrate that FIREBALL can improve natural language generation (NLG) by using Avrae state information, improving both automated metrics and human judgments of quality. Additionally, we show that LLMs can generate executable Avrae commands, particularly after finetuning.
翻译:《龙与地下城》(D&D)是一款桌面角色扮演游戏,其玩家间的自然语言交互与隐藏状态信息极为复杂。近期研究表明,能够访问状态信息的大型语言模型(LLMs)比仅使用对话历史的LLMs能生成更高质量的游戏回合。然而,先前研究使用的游戏状态信息是基于启发式方法构建的,并非真正的黄金标准游戏状态。我们提出FIREBALL数据集,该大型数据集包含来自Discord平台真实D&D游戏过程的近25,000个独立会话,并附带真实的游戏状态信息。我们记录了使用Avrae机器人(专为辅助在线D&D游戏而开发)的玩家游戏会话,捕捉了语言、游戏指令及底层游戏状态信息。实验证明,通过利用Avrae状态信息,FIREBALL可提升自然语言生成(NLG)质量,在自动评估指标与人工判断中均表现优异。此外,我们表明LLMs能生成可执行的Avrae指令,尤其是在微调之后效果更为显著。