Robust loss functions are designed to combat the adverse impacts of label noise, whose robustness is typically supported by theoretical bounds agnostic to the training dynamics. However, these bounds may fail to characterize the empirical performance as it remains unclear why robust loss functions can underfit. We show that most loss functions can be rewritten into a form with the same class-score margin and different sample-weighting functions. The resulting curriculum view provides a straightforward analysis of the training dynamics, which helps attribute underfitting to diminished average sample weights and noise robustness to larger weights for clean samples. We show that simple fixes to the curriculums can make underfitting robust loss functions competitive with the state-of-the-art, and training schedules can substantially affect the noise robustness even with robust loss functions. Code is available at \url{github}.
翻译:鲁棒损失函数旨在应对标签噪声的不利影响,其鲁棒性通常由与训练动态无关的理论界支持。然而,这些理论界可能无法准确刻画实际性能,因为鲁棒损失函数为何会导致欠拟合仍不明确。我们证明,大多数损失函数可被重写为具有相同类得分间隔和不同样本加权函数的形式。由此产生的课程视角为训练动态提供了直观分析,有助于将欠拟合归因于平均样本权重的减小,并将噪声鲁棒性归因于干净样本获得更大权重。我们表明,对课程进行简单修正即可使易于欠拟合的鲁棒损失函数达到与最先进方法相当的性能,且即使使用鲁棒损失函数,训练计划仍会显著影响噪声鲁棒性。代码见 \url{github}。