In software engineering, deep learning models are increasingly deployed for critical tasks such as bug detection and code review. However, overfitting remains a challenge that affects the quality, reliability, and trustworthiness of software systems that utilize deep learning models. Overfitting can be (1) prevented (e.g., using dropout or early stopping) or (2) detected in a trained model (e.g., using correlation-based approaches). Both overfitting detection and prevention approaches that are currently used have constraints (e.g., requiring modification of the model structure, and high computing resources). In this paper, we propose a simple, yet powerful approach that can both detect and prevent overfitting based on the training history (i.e., validation losses). Our approach first trains a time series classifier on training histories of overfit models. This classifier is then used to detect if a trained model is overfit. In addition, our trained classifier can be used to prevent overfitting by identifying the optimal point to stop a model's training. We evaluate our approach on its ability to identify and prevent overfitting in real-world samples. We compare our approach against correlation-based detection approaches and the most commonly used prevention approach (i.e., early stopping). Our approach achieves an F1 score of 0.91 which is at least 5% higher than the current best-performing non-intrusive overfitting detection approach. Furthermore, our approach can stop training to avoid overfitting at least 32% of the times earlier than early stopping and has the same or a better rate of returning the best model.
翻译:在软件工程中,深度学习模型越来越多地被部署用于关键任务,如缺陷检测和代码审查。然而,过拟合仍然是影响采用深度学习模型的软件系统质量、可靠性和可信度的一大挑战。过拟合可以被(1)预防(例如,使用 dropout 或早停法)或(2)在已训练模型中检测(例如,使用基于相关性的方法)。当前使用的过拟合检测和预防方法均存在局限性(例如,需要修改模型结构、计算资源要求高)。在本文中,我们提出了一种简单而强大的方法,能够基于训练历史(即验证损失)同时实现过拟合的检测与预防。该方法首先在过拟合模型的训练历史上训练一个时间序列分类器。随后,利用该分类器检测已训练模型是否过拟合。此外,所训练的分类器还可通过识别停止模型训练的最优时机来预防过拟合。我们在真实世界样本上评估了该方法识别和预防过拟合的能力,并将其与基于相关性的检测方法及最常用的预防方法(即早停法)进行了对比。我们的方法取得了 0.91 的 F1 分数,比当前性能最佳的非侵入式过拟合检测方法至少高出 5%。此外,与早停法相比,我们的方法在至少 32% 的情况下能更早地停止训练以避免过拟合,同时返回最佳模型的比率相同或更高。