Large language models (LLMs) exhibit enhanced capabilities in language understanding and generation. By utilizing their embedded knowledge, LLMs are increasingly used as conversational recommender systems (CRS), achieving improved performance across diverse scenarios. However, existing LLM-based methods rely on pretrained knowledge without external retrieval mechanisms for novel items. Additionally, the lack of a unified corpus poses challenges for integrating retrieval augmentation into CRS. Motivated by these challenges, we present RAR, a novel two-stage retrieval augmented conversational recommendation framework that aligns retrieval and generation to enhance both performance and factuality. To support this framework and provide a unified corpus, we construct a large-scale movie corpus, comprising over 300k movies with rich metadata, such as titles, casts and plot summaries. Leveraging this data, our primary contribution is RAR, the first framework to departs from standard two-stage CRS by dynamically bridging retrieval and generation. First, a retriever model generates candidate items based on user history; in the subsequent stage, an LLM refines the recommendations by incorporating conversational context with retrieved results. In addition, we introduce a novel reinforcement learning (RL) method that leverages LLM feedback to iteratively update the retriever. By creating a collaborative feedback loop that reinforces sampled candidate sets with higher ranking metrics, RAR effectively mitigates the misalignment between the retrieval and generation stages. Furthermore, grounding the LLM in factual metadata allows our RL-driven approach to capture subtle user intentions and generate context-aware recommendations with reduced hallucinations. We validate our approach through extensive experiments on multiple benchmarks, where RAR consistently outperforms state-of-the-art baseline methods.
翻译:大语言模型(LLMs)在语言理解与生成方面展现出更强的能力。通过利用其内置知识,LLMs越来越多地被用作对话推荐系统(CRS),在不同场景中取得了性能提升。然而,现有基于LLM的方法仅依赖预训练知识,缺乏针对新物品的外部检索机制。此外,统一语料库的缺失为将检索增强技术整合到CRS中带来了挑战。受这些挑战启发,我们提出RAR——一种新颖的两阶段检索增强对话推荐框架,通过对齐检索与生成过程来提升性能与事实正确性。为支撑该框架并提供统一语料库,我们构建了一个大规模电影语料库,包含超过30万部电影及其丰富的元数据(如标题、演员阵容和剧情简介)。基于这些数据,我们的核心贡献是RAR,这是首个突破标准两阶段CRS范式、通过动态桥接检索与生成环节的框架。首先,检索器模型根据用户历史生成候选项目;随后,大语言模型结合检索结果与对话上下文对推荐进行精炼。此外,我们提出一种创新的强化学习(RL)方法,利用LLM反馈迭代更新检索器。通过构建协作反馈循环(强化具有更高排序指标的采样候选集),RAR有效缓解了检索与生成阶段之间的对齐偏差。更关键的是,将LLM锚定于事实元数据,使得我们基于RL的方法能够捕捉用户细微意图,并生成降低幻觉的上下文感知推荐。我们在多个基准上通过大量实验验证了该方法,结果表明RAR持续优于最先进的基线方法。