Representation Curriculum: Stagewise Training for Robust Ranking and Allocation

Ranking in digital marketplaces is a dynamic exposure-allocation mechanism: displayed items shape discovery trajectories and success events logged by the platform to update future allocation policies. Modern ranking systems rely heavily on exposure-confounded signals (e.g. popularity estimates, CTR/CVR aggregates, and ID-based representation), because they are highly predictive under stationary demand. Yet this predictive power can become a learning shortcut: early access to exposure-dependent belief signals steers optimization toward over-reliance on them and away from exposure-independent merit signals (e.g., content-based competitiveness and semantic affinity). Consequently, the learned policy tends to entrench incumbents and degrade cold-start generalization and robustness under distribution shift. We propose Representation Curriculum (RC), a training-time intervention that temporally stages feature utilization. RC foregrounds content-based merit signals initially, then introduces exposure-dependent belief signals while anchoring the content pathway near the learned merit representation, curbing shortcut reliance on historical signals and mitigating gradient starvation on content signals. We formalize RC independently of task and hypothesis class and provide ranking-specific instantiations. In a Gaussian linear ridge setting, we derive closed-form solutions and sufficient conditions under which RC strictly reduces population risk on a cold-start target distribution, with a quantified Pareto tradeoff against source performance. Experiments on public learning-to-rank and recommendation benchmarks, and randomized online experiments in a large-scale e-commerce search system, show that RC measurably shifts reliance from historical belief signals toward content-based merit signals and yields consistent gains on cold populations with a controlled trade-off in head performance.

翻译：摘要：数字市场中的排序是一种动态曝光分配机制：展示的物品塑造用户发现轨迹，平台记录成功事件以更新未来分配策略。现代排序系统严重依赖受曝光影响的信号（如流行度估计、点击率/转化率聚合、基于ID的表征），因为这些信号在需求稳定时具有高度预测性。然而，这种预测能力可能成为学习捷径：过早接触依赖曝光的信念信号会使优化过度依赖这些信号，而忽视独立于曝光的价值信号（如基于内容的竞争力和语义亲和性）。由此，学习策略倾向于固化现有项目，降低冷启动泛化能力及分布偏移下的鲁棒性。我们提出表征课程（Representation Curriculum, RC），这是一种训练阶段的干预方法，通过时间分阶段调控特征利用。RC初始阶段优先使用基于内容的价值信号，随后引入依赖曝光的信念信号，同时将内容通路锚定在已学习的价值表征附近，从而抑制对历史信号的捷径依赖，缓解内容信号的梯度饥饿问题。我们形式化定义了独立于任务和假设类的RC，并提供面向排序的具体实例。在高斯线性回归场景中，我们推导了封闭解及充分条件，证明RC能够严格降低冷启动目标分布的总体风险，并量化了源域性能与目标域风险的帕累托权衡。在公开学习排序与推荐基准测试及大规模电商搜索系统的随机在线实验中，RC显著将模型依赖从历史信念信号转向基于内容的价值信号，在可控头部性能权衡下持续提升冷启动群体表现。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

《通用时间序列表示学习》最新2024综述

专知会员服务

61+阅读 · 2024年1月15日

【KDD2023】学习语言表示用于序列推荐

专知会员服务

11+阅读 · 2023年5月27日

【RecSys22教程】多阶段推荐系统的神经重排序，90页ppt

专知会员服务

27+阅读 · 2022年9月30日

【干货书】基于统计和机器学习的实用时间序列分析预测，Practical Time Series Analysis Prediction with Statistics & Machine Learning

专知会员服务

144+阅读 · 2022年4月8日