Existing multimodal task-oriented dialog data fails to demonstrate the diverse expressions of user subjective preferences and recommendation acts in the real-life shopping scenario. This paper introduces a new dataset SURE (Multimodal Recommendation Dialog with SUbjective PREference), which contains 12K shopping dialogs in complex store scenes. The data is built in two phases with human annotations to ensure quality and diversity. SURE is well-annotated with subjective preferences and recommendation acts proposed by sales experts. A comprehensive analysis is given to reveal the distinguishing features of SURE. Three benchmark tasks are then proposed on the data to evaluate the capability of multimodal recommendation agents. Based on the SURE, we propose a baseline model, powered by a state-of-the-art multimodal model, for these tasks.
翻译:现有面向多模态任务型对话的数据集未能充分展现真实购物场景中用户主观偏好的多样化表达及推荐行为。本文提出新型数据集SURE(含主观偏好的多模态推荐对话),其中包含12K个复杂商店场景下的购物对话。该数据集通过两阶段人工标注构建,以确保数据质量与多样性。SURE依据销售专家提出的主观偏好与推荐行为准则进行了精细化标注,并通过全面分析揭示了其独特特征。基于此数据,我们设计了三个基准任务以评估多模态推荐代理的能力。此外,本文还提出了一个基于当前最先进多模态模型的基线方案,用于完成这些任务。