The pre-ranking stage in industrial recommendation systems faces a fundamental conflict between efficiency and effectiveness. While powerful models like Target Attention (TA) excel at capturing complex feature interactions in the ranking stage, their high computational cost makes them infeasible for pre-ranking, which often relies on simplistic vector-product models. This disparity creates a significant performance bottleneck for the entire system. To bridge this gap, we propose TARQ, a novel pre-ranking framework. Inspired by generative models, TARQ's key innovation is to equip pre-ranking with an architecture approximate to TA by Residual Quantization. This allows us to bring the modeling power of TA into the latency-critical pre-ranking stage for the first time, establishing a new state-of-the-art trade-off between accuracy and efficiency. Extensive offline experiments and large-scale online A/B tests at Taobao demonstrate TARQ's significant improvements in ranking performance. Consequently, our model has been fully deployed in production, serving tens of millions of daily active users and yielding substantial business improvements. The code and data are available at https://github.com/zyody/tarq_sigir2026.
翻译:工业推荐系统中的前排序阶段面临着效率与效果之间的根本性矛盾。虽然目标注意力(Target Attention, TA)等强大模型在排序阶段擅长捕捉复杂的特征交互,但其高昂的计算成本使其无法应用于前排序阶段——该阶段通常依赖简单的向量积模型。这种差异为整个系统造成了显著的性能瓶颈。为弥合这一差距,我们提出了TARQ,一种新颖的前排序框架。受生成模型启发,TARQ的关键创新在于通过残差量化(Residual Quantization)为前排序阶段配备近似于TA的架构。这使得我们首次将TA的建模能力引入延迟敏感的前排序阶段,在准确性与效率之间建立了新的最优权衡。在淘宝平台进行的大量离线实验和在线A/B测试表明,TARQ在排序性能上取得了显著提升。因此,我们的模型已全面部署于生产环境,服务数千万日活跃用户,并带来可观的业务改进。代码与数据已开源:https://github.com/zyody/tarq_sigir2026。