Learning from noisy data is a challenging task that significantly degenerates the model performance. In this paper, we present TCL, a novel twin contrastive learning model to learn robust representations and handle noisy labels for classification. Specifically, we construct a Gaussian mixture model (GMM) over the representations by injecting the supervised model predictions into GMM to link label-free latent variables in GMM with label-noisy annotations. Then, TCL detects the examples with wrong labels as the out-of-distribution examples by another two-component GMM, taking into account the data distribution. We further propose a cross-supervision with an entropy regularization loss that bootstraps the true targets from model predictions to handle the noisy labels. As a result, TCL can learn discriminative representations aligned with estimated labels through mixup and contrastive learning. Extensive experimental results on several standard benchmarks and real-world datasets demonstrate the superior performance of TCL. In particular, TCL achieves 7.5\% improvements on CIFAR-10 with 90\% noisy label -- an extremely noisy scenario. The source code is available at \url{https://github.com/Hzzone/TCL}.
翻译:从带有噪声的数据中学习是一项极具挑战性的任务,会显著降低模型性能。本文提出了一种新颖的双胞胎对比学习模型TCL,用于学习鲁棒的表示并处理分类中的噪声标签。具体而言,我们通过将监督模型预测注入高斯混合模型(GMM),将GMM中的无标签潜变量与含噪声的标签标注关联起来,从而在表示上构建一个高斯混合模型。随后,TCL利用另一个双分量GMM检测带有错误标签的样本作为分布外样本,并考虑了数据的分布特性。我们进一步提出了一种带有熵正则化损失的交叉监督方法,通过从模型预测中引导真实目标来处理噪声标签。最终,TCL能够通过混合增强和对比学习,学习与估计标签对齐的判别性表示。在多个标准基准和真实数据集上的大量实验结果表明,TCL具有优越的性能。特别地,在CIFAR-10数据集上(包含90%噪声标签——一种极端噪声场景),TCL实现了7.5%的性能提升。源代码已开源在\url{https://github.com/Hzzone/TCL}。