Dungeons & Dragons (D&D) is a tabletop roleplaying game with complex natural language interactions between players and hidden state information. Recent work has shown that large language models (LLMs) that have access to state information can generate higher quality game turns than LLMs that use dialog history alone. However, previous work used game state information that was heuristically created and was not a true gold standard game state. We present FIREBALL, a large dataset containing nearly 25,000 unique sessions from real D&D gameplay on Discord with true game state info. We recorded game play sessions of players who used the Avrae bot, which was developed to aid people in playing D&D online, capturing language, game commands and underlying game state information. We demonstrate that FIREBALL can improve natural language generation (NLG) by using Avrae state information, improving both automated metrics and human judgments of quality. Additionally, we show that LLMs can generate executable Avrae commands, particularly after finetuning.
翻译:龙与地下城(D&D)是一款玩家间存在复杂自然语言交互且包含隐藏状态信息的桌面角色扮演游戏。近期研究表明,能够访问状态信息的大型语言模型(LLMs)比仅依赖对话历史的LLMs能生成更高质量的游戏回合。然而,先前研究采用的游戏状态信息是基于启发式方法生成的,并非真正的黄金标准游戏状态。我们提出FIREBALL数据集,包含来自Discord平台真实D&D游戏过程中近25,000个独特会话及其真实游戏状态信息。通过记录使用Avrae机器人(专为辅助在线D&D游戏而开发)的玩家游戏过程,我们捕获了语言、游戏指令及底层游戏状态信息。实验证明,利用Avrae状态信息能提升自然语言生成(NLG)质量,同时改善自动评估指标和人工质量评判。此外,研究表明LLMs(尤其是经过微调后)能够生成可执行的Avrae指令。