Recommender systems trained on offline historical user behaviors are embracing conversational techniques to online query user preference. Unlike prior conversational recommendation approaches that systemically combine conversational and recommender parts through a reinforcement learning framework, we propose CORE, a new offline-training and online-checking paradigm that bridges a COnversational agent and REcommender systems via a unified uncertainty minimization framework. It can benefit any recommendation platform in a plug-and-play style. Here, CORE treats a recommender system as an offline relevance score estimator to produce an estimated relevance score for each item; while a conversational agent is regarded as an online relevance score checker to check these estimated scores in each session. We define uncertainty as the summation of unchecked relevance scores. In this regard, the conversational agent acts to minimize uncertainty via querying either attributes or items. Based on the uncertainty minimization framework, we derive the expected certainty gain of querying each attribute and item, and develop a novel online decision tree algorithm to decide what to query at each turn. Experimental results on 8 industrial datasets show that CORE could be seamlessly employed on 9 popular recommendation approaches. We further demonstrate that our conversational agent could communicate as a human if empowered by a pre-trained large language model.
翻译:基于离线历史用户行为训练的推荐系统正采用对话技术实现在线用户偏好查询。与先前通过强化学习框架系统性融合推荐与对话组件的对话式推荐方法不同,我们提出CORE这一新型离线训练-在线校验范式,通过统一的不确定性最小化框架连接对话代理与推荐系统。该方案以即插即用方式适用于任何推荐平台。其中,CORE将推荐系统视为离线相关性评分估计器,为每个项目生成预估相关性分数;而对话代理则作为在线相关性评分校验器,在每个会话中核验这些估计分数。我们将不确定性定义为未核验相关性分数之和,使得对话代理通过查询属性或项目来最小化不确定性。基于该不确定性最小化框架,我们推导了查询各属性与项目的预期置信增益,并开发出新型在线决策树算法以决定每轮查询内容。在8个工业数据集上的实验表明,CORE可无缝应用于9种主流推荐方法。我们进一步证明,当结合预训练大语言模型时,我们的对话代理能实现类人对话交互。