Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training. During the early phase, we find dropout reduces the directional variance of gradients across mini-batches and helps align the mini-batch gradients with the entire dataset's gradient. This helps counteract the stochasticity of SGD and limit the influence of individual batches on model training. Our findings lead us to a solution for improving performance in underfitting models - early dropout: dropout is applied only during the initial phases of training, and turned off afterwards. Models equipped with early dropout achieve lower final training loss compared to their counterparts without dropout. Additionally, we explore a symmetric technique for regularizing overfitting models - late dropout, where dropout is not used in the early iterations and is only activated later in training. Experiments on ImageNet and various vision tasks demonstrate that our methods consistently improve generalization accuracy. Our results encourage more research on understanding regularization in deep learning and our methods can be useful tools for future neural network training, especially in the era of large data. Code is available at https://github.com/facebookresearch/dropout .
翻译:由Hinton等人于2012年提出的丢弃法(dropout)作为防止神经网络过拟合的正则化方法,经受住了时间的考验。在本研究中,我们证明丢弃法在训练初期使用也能缓解欠拟合问题。在早期阶段,我们发现丢弃法降低了跨小批量的梯度方向方差,并有助于使小批量梯度与整个数据集的梯度对齐。这有助于抵消随机梯度下降(SGD)的随机性,并限制单个批次对模型训练的影响。我们的发现催生了一种提升欠拟合模型性能的解决方案——早期丢弃法:丢弃仅在训练初始阶段应用,之后关闭。配备早期丢弃法的模型相较于未使用丢弃法的模型,最终训练损失更低。此外,我们探索了一种用于正则化过拟合模型的对称技术——后期丢弃法,即在训练早期迭代中不使用丢弃法,仅在训练后期激活。在ImageNet及多种视觉任务上的实验表明,我们的方法 consistently 提升了泛化准确率。我们的结果鼓励对深度学习正则化进行更多研究,且我们的方法可成为未来神经网络训练(尤其是在大数据时代)的实用工具。代码已开源至 https://github.com/facebookresearch/dropout。