Partial label learning (PLL) is a typical weakly supervised learning, where each sample is associated with a set of candidate labels. The basic assumption of PLL is that the ground-truth label must reside in the candidate set. However, this assumption may not be satisfied due to the unprofessional judgment of the annotators, thus limiting the practical application of PLL. In this paper, we relax this assumption and focus on a more general problem, noisy PLL, where the ground-truth label may not exist in the candidate set. To address this challenging problem, we propose a novel framework called "Iterative Refinement Network (IRNet)". It aims to purify the noisy samples by two key modules, i.e., noisy sample detection and label correction. Ideally, we can convert noisy PLL into traditional PLL if all noisy samples are corrected. To guarantee the performance of these modules, we start with warm-up training and exploit data augmentation to reduce prediction errors. Through theoretical analysis, we prove that IRNet is able to reduce the noise level of the dataset and eventually approximate the Bayes optimal classifier. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our method. IRNet is superior to existing state-of-the-art approaches on noisy PLL.
翻译:偏标签学习(PLL)是一种典型的弱监督学习,其中每个样本与一组候选标签相关联。PLL的基本假设是真实标签必须存在于候选集中。然而,由于标注者专业判断不足,这一假设可能无法满足,从而限制了PLL的实际应用。本文放宽了这一假设,聚焦于更一般的含噪偏标签学习问题——真实标签可能不存在于候选集中。为应对这一挑战性难题,我们提出了一种名为“迭代精炼网络(IRNet)”的新型框架。该框架通过两个关键模块(即噪声样本检测与标签校正)净化含噪样本。理想情况下,若所有噪声样本均被校正,含噪偏标签学习可转化为传统偏标签学习。为保证这些模块的性能,我们首先进行热身训练,并利用数据增强减少预测误差。通过理论分析,我们证明IRNet能够降低数据集的噪声水平,并最终逼近贝叶斯最优分类器。在多个基准数据集上的实验结果表明,本方法具有有效性:IRNet在含噪偏标签学习任务中优于现有最先进方法。