Conversational recommendation systems elicit user preferences by interacting with users to obtain their feedback on recommended commodities. Such systems utilize a multi-armed bandit framework to learn user preferences in an online manner and have received great success in recent years. However, existing conversational bandit methods have several limitations. First, they only enable users to provide explicit binary feedback on the recommended items or categories, leading to ambiguity in interpretation. In practice, users are usually faced with more than one choice. Relative feedback, known for its informativeness, has gained increasing popularity in recommendation system design. Moreover, current contextual bandit methods mainly work under linear reward assumptions, ignoring practical non-linear reward structures in generalized linear models. Therefore, in this paper, we introduce relative feedback-based conversations into conversational recommendation systems through the integration of dueling bandits in generalized linear models (GLM) and propose a novel conversational dueling bandit algorithm called ConDuel. Theoretical analyses of regret upper bounds and empirical validations on synthetic and real-world data underscore ConDuel's efficacy. We also demonstrate the potential to extend our algorithm to multinomial logit bandits with theoretical and experimental guarantees, which further proves the applicability of the proposed framework.
翻译:对话式推荐系统通过与用户互动获取其对推荐商品的反馈,从而推断用户偏好。此类系统采用多臂赌博机框架在线学习用户偏好,近年来取得了显著成功。然而,现有的对话式赌博机方法存在若干局限。首先,它们仅允许用户对推荐项目或类别提供显式的二元反馈,导致解释存在歧义。实践中,用户通常面临多个选择。相对反馈因其信息丰富性,在推荐系统设计中日益受到青睐。此外,当前的情境赌博机方法主要基于线性奖励假设,忽略了广义线性模型中实际存在的非线性奖励结构。因此,本文通过整合广义线性模型中的对决赌博机,将基于相对反馈的对话机制引入对话式推荐系统,并提出一种名为ConDuel的新型对话式对决赌博机算法。对遗憾上界的理论分析以及在合成数据和真实数据上的实证验证,均证实了ConDuel的有效性。我们还展示了将算法扩展至多项Logit赌博机的潜力,并提供了理论与实验保证,进一步证明了所提出框架的适用性。