Label noise widely exists in large-scale datasets and significantly degenerates the performances of deep learning algorithms. Due to the non-identifiability of the instance-dependent noise transition matrix, most existing algorithms address the problem by assuming the noisy label generation process to be independent of the instance features. Unfortunately, noisy labels in real-world applications often depend on both the true label and the features. In this work, we tackle instance-dependent label noise with a novel deep generative model that avoids explicitly modeling the noise transition matrix. Our algorithm leverages casual representation learning and simultaneously identifies the high-level content and style latent factors from the data. By exploiting the supervision information of noisy labels with structural causal models, our empirical evaluations on a wide range of synthetic and real-world instance-dependent label noise datasets demonstrate that the proposed algorithm significantly outperforms the state-of-the-art counterparts.
翻译:标签噪声广泛存在于大规模数据集中,并显著降低了深度学习算法的性能。由于实例相关噪声转移矩阵的不可识别性,现有算法大多通过假设噪声标签生成过程独立于实例特征来处理该问题。然而,现实应用中的噪声标签通常同时依赖于真实标签和特征。本文提出了一种新颖的深度生成模型来处理实例相关标签噪声,该模型避免了显式建模噪声转移矩阵。我们的算法利用因果表征学习,从数据中同时识别高层内容和风格潜在因子。通过利用结构因果模型中的噪声标签监督信息,在广泛的合成和真实实例相关标签噪声数据集上的实证评估表明,所提出算法显著优于现有最先进方法。