UniPinRec: Unifying Generative Retrieval and Ranking at Pinterest Scale

Hanyu Li,Yi-Ping Hsu,Aditya Mantha,Prabhat Agarwal,Laksh Bhasin,Jialu Wang,Hongtao Lin,Bella Huang,Yaxin Li,Xinyi Li,Chuxi Wang,Kousik Rajesh,Hooshmand Shokri Razaghi,Shunyao Li,Zongyue Qin,Jaewon Yang,James Li,Dhruvil Deven Badani,Jiajing Xu,Charles Rosenberg

Modern recommendation systems predominantly train retrieval and ranking as separate models despite both increasingly relying on large transformers encoding the same user behavior data, duplicating parameters, compute, and serving cost. Prior work unifies the model architecture but not the full pipeline: input formats, training procedures, and serving stacks remain fragmented across stages. We present UniPinRec, which achieves full-stack unification of retrieval and ranking at Pinterest: one input format, one model, one training stage, deployed within existing serving infrastructure. A shared transformer encodes the user action sequence into candidate-independent representations that branch into retrieval (ANN dot-product) and ranking (cross-attention) via task-specific heads. Three ideas make this work: (1) Masked Action Modeling (MAM) eliminates interleaving, enabling weight sharing without doubling context length; (2) Blended training examples pair action sequences with feedview impression slates to satisfy both objectives jointly; (3) Cross-stage KV cache sharing reuses user-history computation from retrieval for ranking, reducing total FLOPs versus serving two independent models. Deployed in the Pinterest core surfaces, UniPinRec delivers approximately +1% online engagement lift while cutting end-to-end serving latency by 11.1% and lifting QPS by 63.6%. To our knowledge, this is the first full-stack unification of retrieval and ranking, covering inputs, model, training and serving, deployed in a production recommendation system.

翻译：现代推荐系统通常将检索和排序作为独立模型训练，尽管二者日益依赖编码相同用户行为数据的大型Transformer，导致参数、计算和推理成本冗余。现有工作统一了模型架构，但未实现全流程统一：输入格式、训练流程和推理堆栈仍因阶段而异。我们提出UniPinRec，在Pinterest实现了检索与排序的全栈统一：单一输入格式、单一模型、单一训练阶段，并部署于现有推理基础设施。共享Transformer将用户行为序列编码为候选无关的表示，通过任务特定头部分支至检索（ANN点积）和排序（交叉注意力）。三个关键创新支撑该方法：（1）掩码动作建模（MAM）消除交错操作，实现权重共享而无需加倍上下文长度；（2）混合训练样本将动作序列与Feed视图曝光池配对，联合满足两种目标；（3）跨阶段KV缓存共享复用检索阶段的用户历史计算用于排序，相比独立部署两个模型降低总FLOPs。在Pinterest核心页面部署后，UniPinRec实现约+1%在线参与度提升，同时将端到端推理延迟降低11.1%，QPS提升63.6%。据我们所知，这是首个在生产推荐系统中覆盖输入、模型、训练和推理的全栈检索与排序统一方案。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【博士论文】电商搜索中的排序学习

专知会员服务

13+阅读 · 2025年11月15日

【WWW2025】ImageScope：通过大型多模态模型集体推理统一语言引导的图像检索

专知会员服务

12+阅读 · 2025年4月22日

【AAAI2024】Wikiformer: 利用维基百科结构化信息进行预训练，用于Ad-hoc检索

专知会员服务

19+阅读 · 2023年12月26日