Multi-Epoch Learning for Deep Click-Through Rate Prediction Models

The one-epoch overfitting phenomenon has been widely observed in industrial Click-Through Rate (CTR) applications, where the model performance experiences a significant degradation at the beginning of the second epoch. Recent advances try to understand the underlying factors behind this phenomenon through extensive experiments. However, it is still unknown whether a multi-epoch training paradigm could achieve better results, as the best performance is usually achieved by one-epoch training. In this paper, we hypothesize that the emergence of this phenomenon may be attributed to the susceptibility of the embedding layer to overfitting, which can stem from the high-dimensional sparsity of data. To maintain feature sparsity while simultaneously avoiding overfitting of embeddings, we propose a novel Multi-Epoch learning with Data Augmentation (MEDA), which can be directly applied to most deep CTR models. MEDA achieves data augmentation by reinitializing the embedding layer in each epoch, thereby avoiding embedding overfitting and simultaneously improving convergence. To our best knowledge, MEDA is the first multi-epoch training paradigm designed for deep CTR prediction models. We conduct extensive experiments on several public datasets, and the effectiveness of our proposed MEDA is fully verified. Notably, the results show that MEDA can significantly outperform the conventional one-epoch training. Besides, MEDA has exhibited significant benefits in a real-world scene on Kuaishou.

翻译：单周期过拟合现象已在工业级点击率（CTR）应用中被广泛观察到，即模型性能在第二周期初期出现显著下降。近期研究试图通过大量实验探究该现象背后的潜在因素。然而，多周期训练范式能否取得更优结果仍属未知，因为最佳性能通常由单周期训练实现。本文提出假设：该现象的产生可能归因于嵌入层对过拟合的敏感性，而这种敏感性源于数据的高维稀疏性。为在保持特征稀疏性的同时避免嵌入过拟合，我们提出了一种新颖的基于数据增强的多周期学习（MEDA）方法，该方法可直接应用于大多数深度CTR模型。MEDA通过在每个训练周期重新初始化嵌入层实现数据增强，从而避免嵌入过拟合并同步提升收敛性。据我们所知，MEDA是首个专为深度CTR预测模型设计的多周期训练范式。我们在多个公开数据集上进行了大量实验，充分验证了MEDA的有效性。值得注意的是，实验结果表明MEDA能够显著超越传统单周期训练。此外，MEDA在快手真实业务场景中展现出显著优势。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日