ID-based embeddings are widely used in web-scale online recommendation systems. However, their susceptibility to overfitting, particularly due to the long-tail nature of data distributions, often limits training to a single epoch, a phenomenon known as the "one-epoch problem." This challenge has driven research efforts to optimize performance within the first epoch by enhancing convergence speed or feature sparsity. In this study, we introduce a novel two-stage training strategy that incorporates a pre-training phase using a minimal model with contrastive loss, enabling broader data coverage for the embedding system. Our offline experiments demonstrate that multi-epoch training during the pre-training phase does not lead to overfitting, and the resulting embeddings improve online generalization when fine-tuned for more complex downstream recommendation tasks. We deployed the proposed system in live traffic at Pinterest, achieving significant site-wide engagement gains.
翻译:基于ID的嵌入广泛应用于网络级在线推荐系统。然而,由于数据分布的长尾特性,这类嵌入容易发生过拟合,通常导致训练仅能进行单个轮次,这一现象被称为“单轮次问题”。该挑战促使研究界通过提升收敛速度或增强特征稀疏性来优化首轮训练性能。本研究提出一种新颖的两阶段训练策略,引入基于对比损失的轻量模型预训练阶段,使嵌入系统能够覆盖更广泛的数据。离线实验表明,预训练阶段的多轮次训练不会导致过拟合,且所得嵌入在针对更复杂下游推荐任务进行微调时,能够提升在线泛化能力。我们在Pinterest的实时流量中部署了所提出的系统,实现了全站用户参与度的显著提升。