The one-epoch overfitting problem has drawn widespread attention, especially in CTR and CVR estimation models in search, advertising, and recommendation domains. These models which rely heavily on large-scale sparse categorical features, often suffer a significant decline in performance when trained for multiple epochs. Although recent studies have proposed heuristic solutions, the fundamental cause of this phenomenon remains unclear. In this work, we present a theoretical explanation grounded in Rademacher complexity, supported by empirical experiments, to explain why overfitting occurs in models with large-scale sparse categorical features. Based on this analysis, we propose a regularization method that constrains the norm budget of embedding layers adaptively. Our approach not only prevents the severe performance degradation observed during multi-epoch training, but also improves model performance within a single epoch. This method has already been deployed in online production systems.
翻译:单周期过拟合问题已引起广泛关注,尤其在搜索、广告和推荐领域的CTR和CVR预估模型中。这些严重依赖大规模稀疏类别特征的模型,在进行多周期训练时经常出现性能显著下降。尽管近期研究提出了启发式解决方案,但该现象的根本原因仍不明确。本文基于Rademacher复杂度理论提出解释,并通过实证实验证明大规模稀疏类别特征模型出现过拟合的原因。基于此分析,我们提出一种正则化方法,能自适应约束嵌入层的范数预算。该方法不仅防止了多周期训练中观察到的严重性能退化,同时提升了单周期内的模型性能。本方法已在在线生产系统中部署实施。