Label noise refers to incorrect labels in a dataset caused by human errors or collection defects, which is common in real-world applications and can significantly reduce the accuracy of models. This report explores how to estimate noise transition matrices and construct deep learning classifiers that are robust against label noise. In cases where the transition matrix is known, we apply forward correction and importance reweighting methods to correct the impact of label noise using the transition matrix. When the transition matrix is unknown or inaccurate, we use the anchor point assumption and T-Revision series methods to estimate or correct the noise matrix. In this study, we further improved the T-Revision method by developing T-Revision-Alpha and T-Revision-Softmax to enhance stability and robustness. Additionally, we designed and implemented two baseline classifiers, a Multi-Layer Perceptron (MLP) and ResNet-18, based on the cross-entropy loss function. We compared the performance of these methods on predicting clean labels and estimating transition matrices using the FashionMINIST dataset with known noise transition matrices. For the CIFAR-10 dataset, where the noise transition matrix is unknown, we estimated the noise matrix and evaluated the ability of the methods to predict clean labels.
翻译:标签噪声指由人为错误或采集缺陷导致的数据集中标签错误,这在现实应用中十分常见,会显著降低模型精度。本报告探讨了如何估计噪声转移矩阵以及构建对标签噪声鲁棒的深度学习分类器。在转移矩阵已知的情况下,我们应用前向校正和重要性重加权方法,利用转移矩阵校正标签噪声的影响。当转移矩阵未知或不准确时,我们使用锚点假设和T-Revision系列方法来估计或校正噪声矩阵。在本研究中,我们进一步改进了T-Revision方法,开发了T-Revision-Alpha和T-Revision-Softmax以增强稳定性和鲁棒性。此外,我们基于交叉熵损失函数设计并实现了两个基线分类器:多层感知机(MLP)和ResNet-18。我们在已知噪声转移矩阵的FashionMINIST数据集上比较了这些方法在预测干净标签和估计转移矩阵方面的性能。对于噪声转移矩阵未知的CIFAR-10数据集,我们估计了噪声矩阵并评估了各方法预测干净标签的能力。