Retrieval Augmented Conversational Recommendation with Reinforcement Learning

Large language models (LLMs) exhibit enhanced capabilities in language understanding and generation. By utilizing their embedded knowledge, LLMs are increasingly used as conversational recommender systems (CRS), achieving improved performance across diverse scenarios. However, existing LLM-based methods rely on pretrained knowledge without external retrieval mechanisms for novel items. Additionally, the lack of a unified corpus poses challenges for integrating retrieval augmentation into CRS. Motivated by these challenges, we present RAR, a novel two-stage retrieval augmented conversational recommendation framework that aligns retrieval and generation to enhance both performance and factuality. To support this framework and provide a unified corpus, we construct a large-scale movie corpus, comprising over 300k movies with rich metadata, such as titles, casts and plot summaries. Leveraging this data, our primary contribution is RAR, the first framework to departs from standard two-stage CRS by dynamically bridging retrieval and generation. First, a retriever model generates candidate items based on user history; in the subsequent stage, an LLM refines the recommendations by incorporating conversational context with retrieved results. In addition, we introduce a novel reinforcement learning (RL) method that leverages LLM feedback to iteratively update the retriever. By creating a collaborative feedback loop that reinforces sampled candidate sets with higher ranking metrics, RAR effectively mitigates the misalignment between the retrieval and generation stages. Furthermore, grounding the LLM in factual metadata allows our RL-driven approach to capture subtle user intentions and generate context-aware recommendations with reduced hallucinations. We validate our approach through extensive experiments on multiple benchmarks, where RAR consistently outperforms state-of-the-art baseline methods.

翻译：大型语言模型（LLMs）在语言理解与生成方面展现出卓越能力。通过利用其内置知识，LLMs正越来越多地被用作对话推荐系统（CRS），并在多种场景下实现了性能提升。然而，现有基于LLM的方法依赖预训练知识，缺乏针对新物品的外部检索机制。此外，统一语料库的缺失为将检索增强技术引入CRS带来了挑战。针对这些问题，我们提出RAR——一种新颖的两阶段检索增强对话推荐框架，该框架通过对齐检索与生成过程来提升推荐性能与事实准确性。为支撑此框架并构建统一语料库，我们构建了一个大规模电影语料库，包含超过30万部电影的丰富元数据（如标题、演员表和剧情简介）。基于此数据，我们的核心贡献RAR成为首个突破标准两阶段CRS的框架，以动态方式桥接检索与生成过程：第一阶段，检索模型基于用户历史生成候选物品；第二阶段，LLM通过融合对话上下文与检索结果优化推荐。此外，我们引入了一种创新的强化学习（RL）方法，利用LLM反馈迭代更新检索模型。通过创建协作反馈循环，对具有更高排序指标的采样候选集进行强化，RAR有效缓解了检索与生成阶段之间的错配问题。更进一步，将LLM锚定于事实性元数据，使我们的RL驱动方法能够捕捉用户细微意图，并生成上下文感知的推荐结果，同时减少幻觉现象。我们在多个基准数据集上进行了广泛实验验证，结果表明RAR在各项指标上持续优于最先进的基线方法。