Conversational recommendation systems (CRS) aim to timely and proactively acquire user dynamic preferred attributes through conversations for item recommendation. In each turn of CRS, there naturally have two decision-making processes with different roles that influence each other: 1) director, which is to select the follow-up option (i.e., ask or recommend) that is more effective for reducing the action space and acquiring user preferences; and 2) actor, which is to accordingly choose primitive actions (i.e., asked attribute or recommended item) that satisfy user preferences and give feedback to estimate the effectiveness of the director's option. However, existing methods heavily rely on a unified decision-making module or heuristic rules, while neglecting to distinguish the roles of different decision procedures, as well as the mutual influences between them. To address this, we propose a novel Director-Actor Hierarchical Conversational Recommender (DAHCR), where the director selects the most effective option, followed by the actor accordingly choosing primitive actions that satisfy user preferences. Specifically, we develop a dynamic hypergraph to model user preferences and introduce an intrinsic motivation to train from weak supervision over the director. Finally, to alleviate the bad effect of model bias on the mutual influence between the director and actor, we model the director's option by sampling from a categorical distribution. Extensive experiments demonstrate that DAHCR outperforms state-of-the-art methods.
翻译:对话推荐系统旨在通过对话及时且主动地获取用户动态偏好属性以进行物品推荐。在CRS的每一轮交互中,自然存在两个具有不同角色且相互影响的决策过程:1)导演,其目标是选择更有效的后续选项(即询问或推荐),以缩小动作空间并获取用户偏好;2)执行者,其职责是根据导演的决策选择满足用户偏好的基本动作(即询问的属性或推荐的物品),并通过反馈评估导演选项的有效性。然而现有方法过度依赖统一决策模块或启发式规则,既未区分不同决策过程的角色差异,也忽视了它们之间的相互影响。为此,我们提出新颖的导演-执行者分层对话推荐器,其中导演选择最有效的选项,执行者随之选择满足用户偏好的基本动作。具体而言,我们构建动态超图建模用户偏好,并引入内在动机以通过弱监督信号训练导演模块。最终,为缓解模型偏差对导演与执行者相互影响的不利影响,我们通过从类别分布中采样来建模导演的选项。大量实验表明DAHCR优于现有最先进方法。