We introduce CRS Arena, a research platform for scalable benchmarking of Conversational Recommender Systems (CRS) based on human feedback. The platform displays pairwise battles between anonymous conversational recommender systems, where users interact with the systems one after the other before declaring either a winner or a draw. CRS Arena collects conversations and user feedback, providing a foundation for reliable evaluation and ranking of CRSs. We conduct experiments with CRS Arena on both open and closed crowdsourcing platforms, confirming that both setups produce highly correlated rankings of CRSs and conversations with similar characteristics. We release CRSArena-Dial, a dataset of 474 conversations and their corresponding user feedback, along with a preliminary ranking of the systems based on the Elo rating system. The platform is accessible at https://iai-group-crsarena.hf.space/.
翻译:本文介绍CRS Arena,一个基于人类反馈的可扩展对话式推荐系统基准测试研究平台。该平台呈现匿名对话式推荐系统之间的两两对战,用户依次与不同系统交互后宣布胜出方或判定平局。CRS Arena通过收集对话记录与用户反馈,为对话式推荐系统的可靠评估与排序提供基础。我们在开放与封闭众包平台上分别进行实验,证实两种设置产生的系统排序具有高度相关性,且收集的对话具有相似特征。我们发布了CRSArena-Dial数据集,包含474组对话及其对应的用户反馈,并基于Elo评分系统提供了系统初步排序。该平台可通过 https://iai-group-crsarena.hf.space/ 访问。