Conversational Recommender Systems (CRS) illuminate user preferences via multi-round interactive dialogues, ultimately navigating towards precise and satisfactory recommendations. However, contemporary CRS are limited to inquiring binary or multi-choice questions based on a single attribute type (e.g., color) per round, which causes excessive rounds of interaction and diminishes the user's experience. To address this, we propose a more realistic and efficient conversational recommendation problem setting, called Multi-Type-Attribute Multi-round Conversational Recommendation (MTAMCR), which enables CRS to inquire about multi-choice questions covering multiple types of attributes in each round, thereby improving interactive efficiency. Moreover, by formulating MTAMCR as a hierarchical reinforcement learning task, we propose a Chain-of-Choice Hierarchical Policy Learning (CoCHPL) framework to enhance both the questioning efficiency and recommendation effectiveness in MTAMCR. Specifically, a long-term policy over options (i.e., ask or recommend) determines the action type, while two short-term intra-option policies sequentially generate the chain of attributes or items through multi-step reasoning and selection, optimizing the diversity and interdependence of questioning attributes. Finally, extensive experiments on four benchmarks demonstrate the superior performance of CoCHPL over prevailing state-of-the-art methods.
翻译:对话推荐系统(CRS)通过多轮交互式对话挖掘用户偏好,最终实现精准且令人满意的推荐。然而,现有CRS每轮仅能基于单一属性类型(如颜色)询问二元或多项选择题,导致交互轮次过多,降低了用户体验。为此,我们提出一个更现实且高效的对话推荐问题设置——多类型属性多轮对话推荐(MTAMCR),该设置使CRS能在每轮询问涵盖多种属性类型的多项选择题,从而提升交互效率。进一步,通过将MTAMCR建模为分层强化学习任务,我们提出链式选择分层策略学习(CoCHPL)框架,以增强MTAMCR中的提问效率和推荐效果。具体而言,基于选项的长期策略(即询问或推荐)决定动作类型,而两个短期选项内策略通过多步推理与选择逐步生成属性或物品链,优化提问属性的多样性与相互依赖性。最后,在四个基准数据集上的大量实验表明,CoCHPL在性能上显著优于当前最先进方法。