Ranking in digital marketplaces is a dynamic exposure-allocation mechanism: displayed items shape discovery trajectories and success events logged by the platform to update future allocation policies. Modern ranking systems rely heavily on exposure-confounded signals (e.g. popularity estimates, CTR/CVR aggregates, and ID-based representation), because they are highly predictive under stationary demand. Yet this predictive power can become a learning shortcut: early access to exposure-dependent belief signals steers optimization toward over-reliance on them and away from exposure-independent merit signals (e.g., content-based competitiveness and semantic affinity). Consequently, the learned policy tends to entrench incumbents and degrade cold-start generalization and robustness under distribution shift. We propose Representation Curriculum (RC), a training-time intervention that temporally stages feature utilization. RC foregrounds content-based merit signals initially, then introduces exposure-dependent belief signals while anchoring the content pathway near the learned merit representation, curbing shortcut reliance on historical signals and mitigating gradient starvation on content signals. We formalize RC independently of task and hypothesis class and provide ranking-specific instantiations. In a Gaussian linear ridge setting, we derive closed-form solutions and sufficient conditions under which RC strictly reduces population risk on a cold-start target distribution, with a quantified Pareto tradeoff against source performance. Experiments on public learning-to-rank and recommendation benchmarks, and randomized online experiments in a large-scale e-commerce search system, show that RC measurably shifts reliance from historical belief signals toward content-based merit signals and yields consistent gains on cold populations with a controlled trade-off in head performance.
翻译:摘要:数字市场中的排序是一种动态曝光分配机制:展示的物品塑造用户发现轨迹,平台记录成功事件以更新未来分配策略。现代排序系统严重依赖受曝光影响的信号(如流行度估计、点击率/转化率聚合、基于ID的表征),因为这些信号在需求稳定时具有高度预测性。然而,这种预测能力可能成为学习捷径:过早接触依赖曝光的信念信号会使优化过度依赖这些信号,而忽视独立于曝光的价值信号(如基于内容的竞争力和语义亲和性)。由此,学习策略倾向于固化现有项目,降低冷启动泛化能力及分布偏移下的鲁棒性。我们提出表征课程(Representation Curriculum, RC),这是一种训练阶段的干预方法,通过时间分阶段调控特征利用。RC初始阶段优先使用基于内容的价值信号,随后引入依赖曝光的信念信号,同时将内容通路锚定在已学习的价值表征附近,从而抑制对历史信号的捷径依赖,缓解内容信号的梯度饥饿问题。我们形式化定义了独立于任务和假设类的RC,并提供面向排序的具体实例。在高斯线性回归场景中,我们推导了封闭解及充分条件,证明RC能够严格降低冷启动目标分布的总体风险,并量化了源域性能与目标域风险的帕累托权衡。在公开学习排序与推荐基准测试及大规模电商搜索系统的随机在线实验中,RC显著将模型依赖从历史信念信号转向基于内容的价值信号,在可控头部性能权衡下持续提升冷启动群体表现。