Multi-label classification (MLC) suffers from the inevitable label noise in training data due to the difficulty in annotating various semantic labels in each image. To mitigate the influence of noisy labels, existing methods mainly devote to identifying and correcting the label mistakes via a trained MLC model. However, these methods still involve annoying noisy labels in training, which can result in imprecise recognition of noisy labels and weaken the performance. In this paper, considering that the negative labels are substantially more than positive labels, and most noisy labels are from the negative labels, we directly discard all the negative labels in the dataset, and propose a new method dubbed positive and unlabeled multi-label classification (PU-MLC). By extending positive-unlabeled learning into MLC task, our method trains model with only positive labels and unlabeled data, and introduces adaptive re-balance factor and adaptive temperature coefficient in the loss function to alleviate the catastrophic imbalance in label distribution and over-smoothing of probabilities in training. Our PU-MLC is simple and effective, and it is applicable to both MLC and MLC with partial labels (MLC-PL) tasks. Extensive experiments on MS-COCO and PASCAL VOC datasets demonstrate that our PU-MLC achieves significantly improvements on both MLC and MLC-PL settings with even fewer annotations. Code will be released.
翻译:多标签分类(MLC)因每幅图像需标注多种语义标签的难度,导致训练数据中不可避免存在标签噪声。为减轻噪声标签的影响,现有方法主要通过训练MLC模型来识别和修正标签错误。然而,这些方法在训练过程中仍会引入令人困扰的噪声标签,这可能导致噪声标签识别不精确,并削弱模型性能。本文考虑到负标签数量远多于正标签,且大多数噪声标签来源于负标签,因此直接舍弃数据集中的所有负标签,提出一种名为正标签与无标签多标签分类(PU-MLC)的新方法。通过将正-无标签学习扩展到MLC任务,我们提出的方法仅使用正标签和无标签数据训练模型,并在损失函数中引入自适应再平衡因子和自适应温度系数,以缓解训练中标签分布的灾难性不平衡和概率过度平滑问题。PU-MLC方法简单有效,可同时适用于MLC和部分标签多标签分类(MLC-PL)任务。在MS-COCO和PASCAL VOC数据集上的大量实验表明,即使使用更少的标注数据,我们的PU-MLC在MLC和MLC-PL两种设置下均取得了显著性能提升。代码将开源。