Route recommendation systems commonly adopt a multi-stage pipeline involving fine-ranking and re-ranking to produce high-quality ordered recommendations. However, this paradigm faces three critical limitations. First, there is a misalignment between offline training objectives and online metrics. Offline gains do not necessarily translate to online improvements. Actual performance must be validated through A/B testing, which may potentially compromise the user experience. Second, redundancy elimination relies on rigid, handcrafted rules that lack adaptability to the high variance in user intent and the unstructured complexity of real-world scenarios. Third, the strict separation between fine-ranking and re-ranking stages leads to sub-optimal performance. Since each module is optimized in isolation, the fine-ranking stage remains oblivious to the list-level objectives (e.g., diversity) targeted by the re-ranker, thereby preventing the system from achieving a jointly optimized global optimum. To overcome these intertwined challenges, we propose SCASRec (Self-Correcting and Auto-Stopping Recommendation), a unified generative framework that integrates ranking and redundancy elimination into a single end-to-end process. SCASRec introduces a stepwise corrective reward (SCR) to guide list-wise refinement by focusing on hard samples, and employs a learnable End-of-Recommendation (EOR) token to terminate generation adaptively when no further improvement is expected. Experiments on two large-scale, open-sourced route recommendation datasets demonstrate that SCASRec establishes an SOTA in offline and online settings. SCASRec has been fully deployed in a real-world navigation app, demonstrating its effectiveness.
翻译:路线推荐系统通常采用包含精排序与重排序的多阶段流水线来生成高质量的有序推荐。然而,该范式面临三个关键局限。首先,离线训练目标与在线指标之间存在错位。离线收益未必能转化为在线提升,实际性能必须通过A/B测试验证,这可能损害用户体验。其次,冗余消除依赖于僵化的人工规则,难以适应用户意图的高度差异及现实场景的非结构化复杂性。第三,精排序与重排序阶段的严格分离导致次优性能。由于各模块独立优化,精排序阶段无法感知重排序器关注的列表级目标(如多样性),从而阻碍系统实现联合优化的全局最优。为克服这些交织的挑战,我们提出SCASRec(自校正与自动停止推荐),一个将排序与冗余消除整合到单一端到端流程的统一生成式框架。SCASRec引入逐步校正奖励(SCR),通过聚焦困难样本来指导列表级优化,并采用可学习的推荐结束(EOR)标记,在预期无进一步改进时自适应终止生成。在两个大规模开源路线推荐数据集上的实验表明,SCASRec在离线与在线场景下均达到SOTA性能。SCASRec已在真实导航应用中全面部署,验证了其有效性。