In real-world applications, perfect labels are rarely available, making it challenging to develop robust machine learning algorithms that can handle noisy labels. Recent methods have focused on filtering noise based on the discrepancy between model predictions and given noisy labels, assuming that samples with small classification losses are clean. This work takes a different approach by leveraging the consistency between the learned model and the entire noisy dataset using the rich representational and topological information in the data. We introduce LaplaceConfidence, a method that to obtain label confidence (i.e., clean probabilities) utilizing the Laplacian energy. Specifically, it first constructs graphs based on the feature representations of all noisy samples and minimizes the Laplacian energy to produce a low-energy graph. Clean labels should fit well into the low-energy graph while noisy ones should not, allowing our method to determine data's clean probabilities. Furthermore, LaplaceConfidence is embedded into a holistic method for robust training, where co-training technique generates unbiased label confidence and label refurbishment technique better utilizes it. We also explore the dimensionality reduction technique to accommodate our method on large-scale noisy datasets. Our experiments demonstrate that LaplaceConfidence outperforms state-of-the-art methods on benchmark datasets under both synthetic and real-world noise.
翻译:在现实应用中,完美标签往往难以获得,这使得开发能够处理噪声标签的鲁棒机器学习算法面临挑战。近年来的方法主要基于模型预测与给定噪声标签之间的差异来过滤噪声,假设分类损失小的样本是干净的。本研究另辟蹊径,利用数据中丰富的表征信息和拓扑结构,通过所学模型与整个噪声数据集之间的一致性来实现。我们提出LaplaceConfidence方法,该方法利用拉普拉斯能量来获取标签置信度(即干净概率)。具体而言,它首先基于所有噪声样本的特征表示构建图,通过最小化拉普拉斯能量生成低能量图。干净标签应能很好地契合低能量图,而噪声标签则不然,从而使得该方法能够确定数据的干净概率。此外,LaplaceConfidence被嵌入到一种鲁棒训练的整体方法中,其中协同训练技术可生成无偏的标签置信度,标签修复技术能更好地利用该置信度。我们还探索了降维技术以适配大规模噪声数据集。实验表明,在合成噪声和真实噪声的基准数据集上,LaplaceConfidence均优于现有最先进方法。