For multi-stage recommenders in industry, a user request would first trigger a simple and efficient retriever module that selects and ranks a list of relevant items, then the recommender calls a slower but more sophisticated reranking model that refines the item list exposure to the user. To consistently optimize the two-stage retrieval reranking framework, most efforts have focused on learning reranker-aware retrievers. In contrast, there has been limited work on how to achieve a retriever-aware reranker. In this work, we provide evidence that the retriever scores from the previous stage are informative signals that have been underexplored. Specifically, we first empirically show that the reranking task under the two-stage framework is naturally a noise reduction problem on the retriever scores, and theoretically show the limitations of naive utilization techniques of the retriever scores. Following this notion, we derive an adversarial framework DNR that associates the denoising reranker with a carefully designed noise generation module. The resulting DNR solution extends the conventional score error minimization loss with three augmented objectives, including: 1) a denoising objective that aims to denoise the noisy retriever scores to align with the user feedback; 2) an adversarial retriever score generation objective that improves the exploration in the retriever score space; and 3) a distribution regularization term that aims to align the distribution of generated noisy retriever scores with the real ones. We conduct extensive experiments on three public datasets and an industrial recommender system, together with analytical support, to validate the effectiveness of the proposed DNR.
翻译:在工业界的多阶段推荐系统中,用户请求首先会触发一个简单高效的检索模块,该模块筛选并排序出相关物品列表;随后,推荐系统调用一个速度较慢但更为精细的重排序模型,对最终展示给用户的物品列表进行优化。为持续优化这一两阶段检索-重排序框架,现有研究大多集中于学习具有重排序意识的检索器。相比之下,如何实现具有检索器意识的重排序模型的研究则相对有限。本工作中,我们证明前一阶段检索器的评分是尚未被充分挖掘的有效信号。具体而言,我们首先通过实验表明,两阶段框架下的重排序任务本质上是针对检索器评分的去噪问题,并从理论上分析了直接利用检索器评分的朴素方法存在的局限。基于这一观点,我们提出了一个对抗性框架DNR,该框架将去噪重排序器与精心设计的噪声生成模块相结合。所提出的DNR解决方案在传统的评分误差最小化损失函数基础上,扩展了三个增强目标,包括:1)去噪目标,旨在降低噪声检索器评分以对齐用户反馈;2)对抗性检索器评分生成目标,以提升检索器评分空间的探索能力;3)分布正则化项,旨在使生成的噪声检索器评分分布与真实分布对齐。我们在三个公共数据集和一个工业推荐系统上进行了大量实验,并结合理论分析,验证了所提DNR方法的有效性。