In large-scale industrial recommendation systems, retrieval must produce high-quality candidates from massive corpora under strict latency. Recently, Generative Retrieval (GR) has emerged as a viable alternative to Embedding-Based Retrieval (EBR), which quantizes items into a finite token space and decodes candidates autoregressively, providing a scalable path that explicitly models target-history interactions via cross-attention. However, deploying GR in short-video feeds remains challenged by long-short interest interference, context-induced noise in hierarchical SID generation, and the lack of explicit learning from exposed-but-unclicked feedback. To address these challenges, we propose DualGR, which combines (i) a Dual-Branch Long/Short-Term Router (DBR) with selective activation, (ii) Search-based SID Decoding (S2D) that constrains fine-level decoding within the current coarse bucket for efficiency and noise control, and (iii) an Exposure-aware Next-Token Prediction Loss (ENTP-Loss) that treats unclicked exposures as coarse-level hard negatives to promote timely interest fade-out. On the large-scale Kuaishou short-video recommendation system, DualGR has achieved outstanding performance. Online A/B testing shows +0.527% video views and +0.432% watch time lifts, validating DualGR as a practical and effective paradigm for industrial generative retrieval.
翻译:在大规模工业推荐系统中,检索必须在严格延迟约束下从海量语料库中生成高质量候选。近年来,生成式检索(Generative Retrieval, GR)已成为基于嵌入的检索(Embedding-Based Retrieval, EBR)的一种可行替代方案,其将物品量化为有限令牌空间并通过自回归方式解码候选,提供了一条可扩展的路径,能够通过交叉注意力显式建模目标-历史交互。然而,在短视频流中部署GR仍面临以下挑战:长短时兴趣干扰、层次化SID生成中上下文诱导的噪声,以及缺乏对曝光未点击反馈的显式学习。为应对这些挑战,我们提出DualGR,其融合了:(i)具有选择性激活机制的双分支长短时路由器(Dual-Branch Long/Short-Term Router, DBR);(ii)基于搜索的SID解码(Search-based SID Decoding, S2D),将细粒度解码约束在当前粗粒度桶内以提升效率并控制噪声;(iii)曝光感知的下一个令牌预测损失(Exposure-aware Next-Token Prediction Loss, ENTP-Loss),将未点击曝光视为粗粒度硬负例以促进兴趣及时衰减。在大规模快手短视频推荐系统中,DualGR取得了卓越的性能。在线A/B测试显示视频播放量提升+0.527%,观看时长提升+0.432%,验证了DualGR作为工业生成式检索的一种实用且有效的范式。