Advancements in large language models (LLMs) are revolutionizing interactive game design, enabling dynamic plotlines and interactions between players and non-player characters (NPCs). However, LLMs may exhibit flaws such as hallucinations, forgetfulness, or misinterpretations of prompts, causing logical inconsistencies and unexpected deviations from intended designs. Automated techniques for detecting such game bugs are still lacking. To address this, we propose a systematic LLM-based method for automatically identifying such bugs from player game logs, eliminating the need for collecting additional data such as post-play surveys. Applied to a text-based game DejaBoom!, our approach effectively identifies bugs inherent in LLM-powered interactive games, surpassing unstructured LLM-powered bug-catching methods and filling the gap in automated detection of logical and design flaws.
翻译:大型语言模型(LLM)的进步正在彻底改变交互式游戏设计,实现了动态剧情线以及玩家与非玩家角色(NPC)之间的交互。然而,LLM可能表现出幻觉、遗忘或提示误解等缺陷,导致逻辑不一致和与预期设计的意外偏差。目前仍缺乏用于检测此类游戏错误的自动化技术。为此,我们提出一种基于LLM的系统化方法,能够从玩家游戏日志中自动识别此类错误,无需收集游戏后调查等额外数据。在基于文本的游戏DejaBoom!中的应用表明,我们的方法能有效识别LLM驱动的交互式游戏中固有的错误,其性能优于非结构化的LLM错误捕捉方法,填补了逻辑与设计缺陷自动化检测领域的空白。