Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.
翻译:检索增强生成(RAG)系统在在线游戏等动态领域中日益重要,但专用基准测试的缺乏阻碍了该领域的标准化评估。核心难点在于双重动态性:游戏内容更新与玩家社区关注焦点转移之间的持续交互作用。此外,自动化此类基准测试的需求引入了以玩家为中心的真实性这一关键要求,以确保生成的问题具有现实性。为应对这一综合性挑战,我们提出了ChronoPlay——一种用于自动化持续生成游戏RAG基准测试的新型框架。ChronoPlay采用双重动态更新机制来追踪两种形式的变化,并利用双源合成引擎从官方来源和玩家社区提取信息,以确保事实正确性和真实的查询模式。我们在三款不同游戏上实例化了该框架,创建了游戏领域的首个动态RAG基准测试,为模型在这些复杂现实条件下的性能表现提供了新的见解。代码发布于:https://github.com/hly1998/ChronoPlay。