Resources for simulation-based evaluation of conversational recommender systems (CRSs) are scarce. The UserSimCRS toolkit was introduced to address this gap. In this work, we present UserSimCRS v2, a significant upgrade aligning the toolkit with state-of-the-art research. Key extensions include an enhanced agenda-based user simulator, introduction of large language model-based simulators, integration for a wider range of CRSs and datasets, and new LLM-as-a-judge evaluation utilities. We demonstrate these extensions in a case study.
翻译:对话推荐系统(CRS)的仿真评估资源较为稀缺。UserSimCRS 工具包的推出旨在填补这一空白。本文介绍 UserSimCRS v2,这是一次与前沿研究接轨的重大升级。主要扩展包括:增强的基于议程的用户模拟器、基于大语言模型的模拟器的引入、对更广泛 CRS 和数据集的支持集成,以及新的 LLM-as-a-judge 评估工具。我们通过一个案例研究展示了这些扩展功能。