When random label noise is added to a training dataset, the prediction error of a neural network on a label-noise-free test dataset initially improves during early training but eventually deteriorates, following a U-shaped dependence on training time. This behaviour is believed to be a result of neural networks learning the pattern of clean data first and fitting the noise later in the training, a phenomenon that we refer to as clean-priority learning. In this study, we aim to explore the learning dynamics underlying this phenomenon. We theoretically demonstrate that, in the early stage of training, the update direction of gradient descent is determined by the clean subset of training data, leaving the noisy subset has minimal to no impact, resulting in a prioritization of clean learning. Moreover, we show both theoretically and experimentally, as the clean-priority learning goes on, the dominance of the gradients of clean samples over those of noisy samples diminishes, and finally results in a termination of the clean-priority learning and fitting of the noisy samples.
翻译:当训练数据集中加入随机标签噪声时,神经网络在无标签噪声测试集上的预测误差在训练初期会先改善,但最终会恶化,呈现关于训练时间的U型依赖关系。这种行为被认为源于神经网络先学习干净数据的模式,后期再拟合噪声——我们将此现象称为"干净优先学习"。本研究旨在探索这一现象背后的学习动态。我们从理论上证明,在训练初期,梯度下降的更新方向由训练数据的干净子集主导,而噪声子集的影响极小甚至为零,从而形成对干净学习的优先处理。此外,我们通过理论与实验表明,随着干净优先学习的进行,干净样本梯度相对于噪声样本梯度的优势逐渐减弱,最终导致干净优先学习的终止以及噪声样本的拟合。