Pre-ranking is a critical stage in industrial recommendation systems, tasked with efficiently scoring thousands of recalled items for downstream ranking. A key challenge is the train-serving discrepancy: pre-ranking models are trained only on exposed interactions, yet must score all recalled candidates -- including unexposed items -- during online serving. This mismatch not only induces severe sample selection bias but also degrades generalization, especially for long-tail content. Existing debiasing approaches typically rely on heuristics (e.g., negative sampling) or distillation from biased rankers, which either mislabel plausible unexposed items as negatives or propagate exposure bias into pseudo-labels. In this work, we propose Generative Pseudo-Labeling (GPL), a framework that leverages large language models (LLMs) to generate unbiased, content-aware pseudo-labels for unexposed items, explicitly aligning the training distribution with the online serving space. By offline generating user-specific interest anchors and matching them with candidates in a frozen semantic space, GPL provides high-quality supervision without adding online latency. Deployed in a large-scale production system, GPL improves click-through rate by 3.07%, while significantly enhancing recommendation diversity and long-tail item discovery.
翻译:预排序是工业推荐系统中的关键环节,其任务在于对数千个召回项目进行高效评分以供下游排序使用。该阶段面临的核心挑战在于训练与服务之间的差异:预排序模型仅基于已曝光交互进行训练,但在线上服务时必须对所有召回候选项(包括未曝光项目)进行评分。这种不匹配不仅会导致严重的样本选择偏差,还会降低模型的泛化能力,尤其对长尾内容的影响更为显著。现有的去偏方法通常依赖于启发式策略(例如负采样)或从有偏排序器进行知识蒸馏,这些方法要么将合理的未曝光项目错误标注为负样本,要么将曝光偏差传播至伪标签中。本研究提出生成式伪标注框架,该框架利用大语言模型为未曝光项目生成无偏差且内容感知的伪标签,从而显式地将训练分布与线上服务空间对齐。通过离线生成用户特定的兴趣锚点,并在冻结的语义空间中将它们与候选项进行匹配,GPL能够在无需增加线上延迟的情况下提供高质量的监督信号。在大规模生产系统中的部署结果表明,GPL将点击率提升了3.07%,同时显著增强了推荐多样性及长尾项目的发现能力。