What Else Would I Like? A User Simulator using Alternatives for Improved Evaluation of Fashion Conversational Recommendation Systems

In Conversational Recommendation Systems (CRS), a user can provide feedback on recommended items at each interaction turn, leading the CRS towards more desirable recommendations. Currently, different types of CRS offer various possibilities for feedback, i.e., natural language feedback, or answering clarifying questions. In most cases, a user simulator is employed for training as well as evaluating the CRS. Such user simulators typically critique the current retrieved items based on knowledge of a single target item. Still, evaluating systems in offline settings with simulators suffers from problems, such as focusing entirely on a single target item (not addressing the exploratory nature of a recommender system), and exhibiting extreme patience (consistent feedback over a large number of turns). To overcome these limitations, we obtain extra judgements for a selection of alternative items in common CRS datasets, namely Shoes and Fashion IQ Dresses. Going further, we propose improved user simulators that allow simulated users not only to express their preferences about alternative items to their original target, but also to change their mind and level of patience. In our experiments using the relative image captioning CRS setting and different CRS models, we find that using the knowledge of alternatives by the simulator can have a considerable impact on the evaluation of existing CRS models, specifically that the existing single-target evaluation underestimates their effectiveness, and when simulated users are allowed to instead consider alternatives, the system can rapidly respond to more quickly satisfy the user.

翻译：在对话推荐系统（CRS）中，用户可以在每次交互轮次中对推荐物品提供反馈，从而引导CRS朝着更符合偏好的推荐方向前进。目前，不同类型的CRS提供多种反馈方式，例如自然语言反馈或回答澄清性问题。大多数情况下，用户模拟器被用于CRS的训练和评估。这类用户模拟器通常基于对单个目标物品的知识来评析当前检索到的物品。然而，在离线设置中使用模拟器评估系统存在诸多问题，例如完全聚焦于单个目标物品（未能体现推荐系统的探索性），以及表现出极端耐心（在大量交互轮次中保持一致的反馈）。为克服这些局限，我们在常见的CRS数据集（即Shoes和Fashion IQ Dresses）中针对一组替代物品获取了额外的评判。更进一步，我们提出了改进的用户模拟器，允许模拟用户不仅表达对替代物品（而非原始目标物品）的偏好，还能改变其意愿和耐心程度。在使用相对图像描述CRS设置及不同CRS模型的实验中，我们发现模拟器利用替代物品知识对现有CRS模型的评估产生显著影响，具体而言，现有的单目标评估低估了这些模型的有效性；而当模拟用户被允许转而考虑替代物品时，系统能够更快响应以更迅速地满足用户需求。