ChronoPlay：游戏RAG基准测试中双重动态性与真实性的建模框架 (ChronoPlay: A Framework for Modeling Dual Dynamics and Authenticity in Game RAG Benchmarks)

Retrieval Augmented Generation (RAG) systems are increasingly vital in dynamic domains like online gaming, yet the lack of a dedicated benchmark has impeded standardized evaluation in this area. The core difficulty lies in Dual Dynamics: the constant interplay between game content updates and the shifting focus of the player community. Furthermore, the necessity of automating such a benchmark introduces a critical requirement for player-centric authenticity to ensure generated questions are realistic. To address this integrated challenge, we introduce ChronoPlay, a novel framework for the automated and continuous generation of game RAG benchmarks. ChronoPlay utilizes a dual-dynamic update mechanism to track both forms of change, and a dual-source synthesis engine that draws from official sources and player community to ensure both factual correctness and authentic query patterns. We instantiate our framework on three distinct games to create the first dynamic RAG benchmark for the gaming domain, offering new insights into model performance under these complex and realistic conditions. Code is avaliable at: https://github.com/hly1998/ChronoPlay.

翻译：检索增强生成（RAG）系统在在线游戏等动态领域中日益重要，但专用基准测试的缺乏阻碍了该领域的标准化评估。核心难点在于双重动态性：游戏内容更新与玩家社区关注焦点转移之间的持续交互作用。此外，自动化此类基准测试的需求引入了以玩家为中心的真实性这一关键要求，以确保生成的问题具有现实性。为应对这一综合性挑战，我们提出了ChronoPlay——一种用于自动化持续生成游戏RAG基准测试的新型框架。ChronoPlay采用双重动态更新机制来追踪两种形式的变化，并利用双源合成引擎从官方来源和玩家社区提取信息，以确保事实正确性和真实的查询模式。我们在三款不同游戏上实例化了该框架，创建了游戏领域的首个动态RAG基准测试，为模型在这些复杂现实条件下的性能表现提供了新的见解。代码发布于：https://github.com/hly1998/ChronoPlay。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日