We introduce a novel method for training machine learning models in the presence of noisy labels, which are prevalent in domains such as medical diagnosis and autonomous driving and have the potential to degrade a model's generalization performance. Inspired by established literature that highlights how deep learning models are prone to overfitting to noisy samples in the later epochs of training, we propose a strategic approach. This strategy leverages the distance to class centroids in the latent space and incorporates a discounting mechanism, aiming to diminish the influence of samples that lie distant from all class centroids. By doing so, we effectively counteract the adverse effects of noisy labels. The foundational premise of our approach is the assumption that samples situated further from their respective class centroid in the initial stages of training are more likely to be associated with noise. Our methodology is grounded in robust theoretical principles and has been validated empirically through extensive experiments on several benchmark datasets. Our results show that our method consistently outperforms the existing state-of-the-art techniques, achieving significant improvements in classification accuracy in the presence of noisy labels. The code for our proposed loss function and supplementary materials is available at https://github.com/wanifarooq/NCOD
翻译:本文提出一种在噪声标签存在情况下训练机器学习模型的新方法,噪声标签常见于医学诊断和自动驾驶等领域,并可能损害模型的泛化性能。受现有文献启发——深度学习模型在训练后期容易对噪声样本过拟合,我们提出一种策略性方法。该策略利用潜在空间中样本到类质心的距离,并结合折扣机制,旨在减弱远离所有类质心的样本影响。通过这种方式,我们有效抵消了噪声标签的负面影响。本方法的基本前提是:在训练初始阶段,距离其所属类质心较远的样本更可能关联噪声。我们的方法基于稳健的理论原理,并在多个基准数据集上通过大量实验得到实证验证。实验结果表明,本方法在噪声标签存在情况下持续优于现有最优技术,在分类准确率上取得显著提升。所提出的损失函数代码及补充材料详见 https://github.com/wanifarooq/NCOD。