Route recommendation systems commonly adopt a multi-stage pipeline involving fine-ranking and re-ranking to produce high-quality ordered recommendations. However, this paradigm faces three critical limitations. First, there is a misalignment between offline training objectives and online metrics. Offline gains do not necessarily translate to online improvements. Actual performance must be validated through A/B testing, which may potentially compromise the user experience. Second, redundancy elimination relies on rigid, handcrafted rules that lack adaptability to the high variance in user intent and the unstructured complexity of real-world scenarios. Third, the strict separation between fine-ranking and re-ranking stages leads to sub-optimal performance. Since each module is optimized in isolation, the fine-ranking stage remains oblivious to the list-level objectives (e.g., diversity) targeted by the re-ranker, thereby preventing the system from achieving a jointly optimized global optimum. To overcome these intertwined challenges, we propose SCASRec (Self-Correcting and Auto-Stopping Recommendation), a unified generative framework that integrates ranking and redundancy elimination into a single end-to-end process. SCASRec introduces a stepwise corrective reward (SCR) to guide list-wise refinement by focusing on hard samples, and employs a learnable End-of-Recommendation (EOR) token to terminate generation adaptively when no further improvement is expected. Experiments on two large-scale, open-sourced route recommendation datasets demonstrate that SCASRec establishes an SOTA in offline and online settings. SCASRec has been fully deployed in a real-world navigation app, demonstrating its effectiveness.
翻译:路线推荐系统通常采用多阶段流水线,包括精细排序和重排序,以生成高质量的排序推荐结果。然而,这一范式面临三个关键局限。首先,离线训练目标与在线指标之间存在错配。离线性能提升并不必然转化为在线效果改善。实际表现必须通过A/B测试验证,这可能对用户体验造成潜在损害。其次,冗余消除依赖于僵化的人工规则,缺乏对用户意图高方差和真实场景非结构化复杂性的适应能力。第三,精细排序与重排序阶段的严格分离导致次优性能。由于各模块独立优化,精细排序阶段无法感知重排序器所针对的列表级目标(如多样性),从而阻碍系统实现联合优化的全局最优。为克服这些相互交织的挑战,我们提出SCASRec(自校正与自动停止推荐),一个将排序与冗余消除集成到单一端到端流程的统一生成式框架。SCASRec引入逐步修正奖励(SCR)以通过聚焦难样本引导列表级优化,并采用可学习的推荐结束(EOR)标记在预期无进一步改进时自适应终止生成。在两个大规模开源路线推荐数据集上的实验表明,SCASRec在离线和在线场景下均达到当前最优水平。SCASRec已全面部署于真实导航应用中,验证了其有效性。