Equip Pre-ranking with Target Attention by Residual Quantization

The pre-ranking stage in industrial recommendation systems faces a fundamental conflict between efficiency and effectiveness. While powerful models like Target Attention (TA) excel at capturing complex feature interactions in the ranking stage, their high computational cost makes them infeasible for pre-ranking, which often relies on simplistic vector-product models. This disparity creates a significant performance bottleneck for the entire system. To bridge this gap, we propose TARQ, a novel pre-ranking framework. Inspired by generative models, TARQ's key innovation is to equip pre-ranking with an architecture approximate to TA by Residual Quantization. This allows us to bring the modeling power of TA into the latency-critical pre-ranking stage for the first time, establishing a new state-of-the-art trade-off between accuracy and efficiency. Extensive offline experiments and large-scale online A/B tests at Taobao demonstrate TARQ's significant improvements in ranking performance. Consequently, our model has been fully deployed in production, serving tens of millions of daily active users and yielding substantial business improvements. The code and data are available at https://github.com/zyody/tarq_sigir2026.

翻译：工业推荐系统中的前排序阶段面临着效率与效果之间的根本性矛盾。虽然目标注意力（Target Attention, TA）等强大模型在排序阶段擅长捕捉复杂的特征交互，但其高昂的计算成本使其无法应用于前排序阶段——该阶段通常依赖简单的向量积模型。这种差异为整个系统造成了显著的性能瓶颈。为弥合这一差距，我们提出了TARQ，一种新颖的前排序框架。受生成模型启发，TARQ的关键创新在于通过残差量化（Residual Quantization）为前排序阶段配备近似于TA的架构。这使得我们首次将TA的建模能力引入延迟敏感的前排序阶段，在准确性与效率之间建立了新的最优权衡。在淘宝平台进行的大量离线实验和在线A/B测试表明，TARQ在排序性能上取得了显著提升。因此，我们的模型已全面部署于生产环境，服务数千万日活跃用户，并带来可观的业务改进。代码与数据已开源：https://github.com/zyody/tarq_sigir2026。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

TransMLA：多头潜在注意力（MLA）即为所需

专知会员服务

23+阅读 · 2025年2月13日

推荐系统融合排序的多目标寻优技术

专知会员服务

19+阅读 · 2024年8月17日

【KDD2023】学习语言表示用于序列推荐

专知会员服务

11+阅读 · 2023年5月27日

【AAAI2023】统一序列更好:时间间隔感知数据增强的序列推荐

专知会员服务

16+阅读 · 2022年12月31日