Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting

In software engineering, deep learning models are increasingly deployed for critical tasks such as bug detection and code review. However, overfitting remains a challenge that affects the quality, reliability, and trustworthiness of software systems that utilize deep learning models. Overfitting can be (1) prevented (e.g., using dropout or early stopping) or (2) detected in a trained model (e.g., using correlation-based approaches). Both overfitting detection and prevention approaches that are currently used have constraints (e.g., requiring modification of the model structure, and high computing resources). In this paper, we propose a simple, yet powerful approach that can both detect and prevent overfitting based on the training history (i.e., validation losses). Our approach first trains a time series classifier on training histories of overfit models. This classifier is then used to detect if a trained model is overfit. In addition, our trained classifier can be used to prevent overfitting by identifying the optimal point to stop a model's training. We evaluate our approach on its ability to identify and prevent overfitting in real-world samples. We compare our approach against correlation-based detection approaches and the most commonly used prevention approach (i.e., early stopping). Our approach achieves an F1 score of 0.91 which is at least 5% higher than the current best-performing non-intrusive overfitting detection approach. Furthermore, our approach can stop training to avoid overfitting at least 32% of the times earlier than early stopping and has the same or a better rate of returning the best model.

翻译：在软件工程中，深度学习模型越来越多地被部署用于关键任务，如缺陷检测和代码审查。然而，过拟合仍然是影响采用深度学习模型的软件系统质量、可靠性和可信度的一大挑战。过拟合可以被（1）预防（例如，使用 dropout 或早停法）或（2）在已训练模型中检测（例如，使用基于相关性的方法）。当前使用的过拟合检测和预防方法均存在局限性（例如，需要修改模型结构、计算资源要求高）。在本文中，我们提出了一种简单而强大的方法，能够基于训练历史（即验证损失）同时实现过拟合的检测与预防。该方法首先在过拟合模型的训练历史上训练一个时间序列分类器。随后，利用该分类器检测已训练模型是否过拟合。此外，所训练的分类器还可通过识别停止模型训练的最优时机来预防过拟合。我们在真实世界样本上评估了该方法识别和预防过拟合的能力，并将其与基于相关性的检测方法及最常用的预防方法（即早停法）进行了对比。我们的方法取得了 0.91 的 F1 分数，比当前性能最佳的非侵入式过拟合检测方法至少高出 5%。此外，与早停法相比，我们的方法在至少 32% 的情况下能更早地停止训练以避免过拟合，同时返回最佳模型的比率相同或更高。

相关内容

过拟合

关注 8

过拟合，在AI领域多指机器学习得到模型太过复杂，导致在训练集上表现很好，然而在测试集上却不尽人意。过拟合（over-fitting）也称为过学习，它的直观表现是算法在训练集上表现好，但在测试集上表现不好，泛化性能差。过拟合是在模型参数拟合过程中由于训练数据包含抽样误差，在训练时复杂的模型将抽样误差也进行了拟合导致的。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日