Partial-label learning (PLL) is an important weakly supervised learning problem, which allows each training example to have a candidate label set instead of a single ground-truth label. Identification-based methods have been widely explored to tackle label ambiguity issues in PLL, which regard the true label as a latent variable to be identified. However, identifying the true labels accurately and completely remains challenging, causing noise in pseudo labels during model training. In this paper, we propose a new method called CroSel, which leverages historical predictions from the model to identify true labels for most training examples. First, we introduce a cross selection strategy, which enables two deep models to select true labels of partially labeled data for each other. Besides, we propose a novel consistency regularization term called co-mix to avoid sample waste and tiny noise caused by false selection. In this way, CroSel can pick out the true labels of most examples with high precision. Extensive experiments demonstrate the superiority of CroSel, which consistently outperforms previous state-of-the-art methods on benchmark datasets. Additionally, our method achieves over 90\% accuracy and quantity for selecting true labels on CIFAR-type datasets under various settings.
翻译:部分标签学习(PLL)是一种重要的弱监督学习问题,它允许每个训练样本拥有一个候选标签集,而非单一的真实标签。为应对PLL中的标签歧义问题,基于识别的方法已得到广泛探索,这类方法将真实标签视为待识别的潜在变量。然而,准确且完整地识别真实标签仍具挑战性,这导致模型训练过程中伪标签存在噪声。本文提出一种名为CroSel的新方法,该方法利用模型的历史预测结果为大多数训练样本识别真实标签。首先,我们引入一种交叉选择策略,该策略使两个深度模型能够为彼此选择部分标注数据中的真实标签。此外,我们提出一种名为co-mix的新型一致性正则化项,以避免因错误选择导致的样本浪费和微小噪声。通过这种方式,CroSel能以高精度筛选出大多数样本的真实标签。大量实验证明了CroSel的优越性,该方法在基准数据集上始终优于先前的最先进方法。此外,在CIFAR类数据集的各种设置下,我们的方法在选取真实标签时准确率和数量均超过90%。