Modern recommendation systems predominantly train retrieval and ranking as separate models despite both increasingly relying on large transformers encoding the same user behavior data, duplicating parameters, compute, and serving cost. Prior work unifies the model architecture but not the full pipeline: input formats, training procedures, and serving stacks remain fragmented across stages. We present UniPinRec, which achieves full-stack unification of retrieval and ranking at Pinterest: one input format, one model, one training stage, deployed within existing serving infrastructure. A shared transformer encodes the user action sequence into candidate-independent representations that branch into retrieval (ANN dot-product) and ranking (cross-attention) via task-specific heads. Three ideas make this work: (1) Masked Action Modeling (MAM) eliminates interleaving, enabling weight sharing without doubling context length; (2) Blended training examples pair action sequences with feedview impression slates to satisfy both objectives jointly; (3) Cross-stage KV cache sharing reuses user-history computation from retrieval for ranking, reducing total FLOPs versus serving two independent models. Deployed in the Pinterest core surfaces, UniPinRec delivers approximately +1% online engagement lift while cutting end-to-end serving latency by 11.1% and lifting QPS by 63.6%. To our knowledge, this is the first full-stack unification of retrieval and ranking, covering inputs, model, training and serving, deployed in a production recommendation system.
翻译:现代推荐系统通常将检索和排序作为独立模型训练,尽管二者日益依赖编码相同用户行为数据的大型Transformer,导致参数、计算和推理成本冗余。现有工作统一了模型架构,但未实现全流程统一:输入格式、训练流程和推理堆栈仍因阶段而异。我们提出UniPinRec,在Pinterest实现了检索与排序的全栈统一:单一输入格式、单一模型、单一训练阶段,并部署于现有推理基础设施。共享Transformer将用户行为序列编码为候选无关的表示,通过任务特定头部分支至检索(ANN点积)和排序(交叉注意力)。三个关键创新支撑该方法:(1)掩码动作建模(MAM)消除交错操作,实现权重共享而无需加倍上下文长度;(2)混合训练样本将动作序列与Feed视图曝光池配对,联合满足两种目标;(3)跨阶段KV缓存共享复用检索阶段的用户历史计算用于排序,相比独立部署两个模型降低总FLOPs。在Pinterest核心页面部署后,UniPinRec实现约+1%在线参与度提升,同时将端到端推理延迟降低11.1%,QPS提升63.6%。据我们所知,这是首个在生产推荐系统中覆盖输入、模型、训练和推理的全栈检索与排序统一方案。