To balance effectiveness and efficiency in recommender systems, multi-stage pipelines commonly use lightweight two-tower models for large-scale candidate retrieval. However, the isolated two-tower architecture restricts representation capacity, embedding-space alignment, and cross-feature interactions. Existing solutions such as late interaction and knowledge distillation can mitigate these issues, but often increase latency or are difficult to deploy in online learning settings. We propose Capability Synergy (CS3), an efficient online framework that strengthens two-tower retrievers while preserving real-time constraints. CS3 introduces three mechanisms: (1) Cycle-Adaptive Structure for self-revision via adaptive feature denoising within each tower; (2) Cross-Tower Synchronization to improve alignment through lightweight mutual awareness between towers; and (3) Cascade-Model Sharing to enhance cross-stage consistency by reusing knowledge from downstream models. CS3 is plug-and-play with diverse two-tower backbones and compatible with online learning. Experiments on three public datasets show consistent gains over strong baselines, and deployment in a largescale advertising system yields up to 8.36% revenue improvement across three scenarios while maintaining ms-level latency.
翻译:为了在推荐系统中平衡效果与效率,多阶段流水线通常采用轻量级双塔模型进行大规模候选检索。然而,孤立的双塔架构限制了表示能力、嵌入空间对齐和跨特征交互。现有解决方案如延迟交互和知识蒸馏虽可缓解这些问题,但往往增加延迟或难以部署于在线学习场景。我们提出能力协同(CS3),一种在保持实时性约束的同时增强双塔检索器的高效在线框架。CS3引入三种机制:(1)循环自适应结构,通过塔内自适应特征去噪实现自我修正;(2)跨塔同步,通过塔间轻量级互感知提升对齐效果;(3)级联模型共享,通过复用下游模型知识增强跨阶段一致性。CS3可即插即用于多种双塔骨干模型,并兼容在线学习。在三个公开数据集上的实验表明,其持续优于强基线方法;在大型广告系统中的部署在三个场景下实现了最高8.36%的收益提升,同时保持毫秒级延迟。