Multi-Epoch learning with Data Augmentation for Deep Click-Through Rate Prediction

This paper investigates the one-epoch overfitting phenomenon in Click-Through Rate (CTR) models, where performance notably declines at the start of the second epoch. Despite extensive research, the efficacy of multi-epoch training over the conventional one-epoch approach remains unclear. We identify the overfitting of the embedding layer, caused by high-dimensional data sparsity, as the primary issue. To address this, we introduce a novel and simple Multi-Epoch learning with Data Augmentation (MEDA) framework, suitable for both non-continual and continual learning scenarios, which can be seamlessly integrated into existing deep CTR models and may have potential applications to handle the "forgetting or overfitting" dilemma in the retraining and the well-known catastrophic forgetting problems. MEDA minimizes overfitting by reducing the dependency of the embedding layer on subsequent training data or the Multi-Layer Perceptron (MLP) layers, and achieves data augmentation through training the MLP with varied embedding spaces. Our findings confirm that pre-trained MLP layers can adapt to new embedding spaces, enhancing performance without overfitting. This adaptability underscores the MLP layers' role in learning a matching function focused on the relative relationships among embeddings rather than their absolute positions. To our knowledge, MEDA represents the first multi-epoch training strategy tailored for deep CTR prediction models. We conduct extensive experiments on several public and business datasets, and the effectiveness of data augmentation and superiority over conventional single-epoch training are fully demonstrated. Besides, MEDA has exhibited significant benefits in a real-world online advertising system.

翻译：本文研究了点击率（CTR）模型中存在的单轮次过拟合现象，即模型性能在第二轮次训练开始时显著下降。尽管已有大量研究，但多轮次训练相较于传统单轮次方法的有效性仍不明确。我们发现，由高维数据稀疏性导致的嵌入层过拟合是主要问题。为解决此问题，我们提出了一种新颖且简单的基于数据增强的多轮次学习（MEDA）框架，该框架适用于非持续学习和持续学习场景，可无缝集成到现有的深度CTR模型中，并可能对解决重训练中的“遗忘或过拟合”困境及著名的灾难性遗忘问题具有潜在应用价值。MEDA通过降低嵌入层对后续训练数据或多层感知机（MLP）层的依赖性来最小化过拟合，并通过在不同嵌入空间下训练MLP来实现数据增强。我们的研究证实，预训练的MLP层能够适应新的嵌入空间，从而在不发生过拟合的情况下提升性能。这种适应性凸显了MLP层在学习一个专注于嵌入间相对关系而非绝对位置的匹配函数中的作用。据我们所知，MEDA是首个专为深度CTR预测模型定制的多轮次训练策略。我们在多个公开和商业数据集上进行了大量实验，充分证明了数据增强的有效性及其相对于传统单轮次训练的优越性。此外，MEDA在真实世界的在线广告系统中也展现出显著优势。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日