RankGR: Rank-Enhanced Generative Retrieval with Listwise Direct Preference Optimization in Recommendation

Generative retrieval (GR) has emerged as a promising paradigm in recommendation systems by autoregressively decoding identifiers of target items. Despite its potential, current approaches typically rely on the next-token prediction schema, which treats each token of the next interacted items as the sole target. This narrow focus 1) limits their ability to capture the nuanced structure of user preferences, and 2) overlooks the deep interaction between decoded identifiers and user behavior sequences. In response to these challenges, we propose RankGR, a Rank-enhanced Generative Retrieval method that incorporates listwise direct preference optimization for recommendation. RankGR decomposes the retrieval process into two complementary stages: the Initial Assessment Phase (IAP) and the Refined Scoring Phase (RSP). In IAP, we incorporate a novel listwise direct preference optimization strategy into GR, thus facilitating a more comprehensive understanding of the hierarchical user preferences and more effective partial-order modeling. The RSP then refines the top-λ candidates generated by IAP with interactions towards input sequences using a lightweight scoring module, leading to more precise candidate evaluation. Both phases are jointly optimized under a unified GR model, ensuring consistency and efficiency. Additionally, we implement several practical improvements in training and deployment, ultimately achieving a real-time system capable of handling nearly ten thousand requests per second. Extensive offline performance on both research and industrial datasets, as well as the online gains on the "Guess You Like" section of Taobao, validate the effectiveness and scalability of RankGR.

翻译：生成式检索（GR）作为一种新兴的推荐系统范式，通过自回归解码目标物品的标识符来工作。尽管潜力巨大，现有方法通常依赖于下一词元预测范式，将用户下一个交互物品的每个词元视为唯一目标。这种狭隘的聚焦方式存在两方面局限：1）限制了模型捕捉用户偏好细微结构的能力；2）忽视了已解码标识符与用户行为序列之间的深层交互。为应对这些挑战，我们提出了RankGR，一种融合列表式直接偏好优化的排序增强生成式检索方法。RankGR将检索过程分解为两个互补阶段：初始评估阶段（IAP）和精化评分阶段（RSP）。在IAP中，我们将一种新颖的列表式直接偏好优化策略引入GR，从而促进对层次化用户偏好的更全面理解，并实现更有效的偏序建模。RSP随后使用轻量级评分模块，结合输入序列的交互信息，对IAP生成的top-λ候选进行精化评分，从而实现更精确的候选评估。两个阶段在统一的GR模型下进行联合优化，确保了一致性与效率。此外，我们在训练与部署环节实施了多项实用改进，最终构建了能够实时处理每秒近万次请求的系统。在学术与工业数据集上的大量离线实验，以及在淘宝“猜你喜欢”板块的在线收益，均验证了RankGR的有效性与可扩展性。