Partial-label learning (PLL) is an important weakly supervised learning problem, which allows each training example to have a candidate label set instead of a single ground-truth label. Identification-based methods have been widely explored to tackle label ambiguity issues in PLL, which regard the true label as a latent variable to be identified. However, identifying the true labels accurately and completely remains challenging, causing noise in pseudo labels during model training. In this paper, we propose a new method called CroSel, which leverages historical prediction information from models to identify true labels for most training examples. First, we introduce a cross selection strategy, which enables two deep models to select true labels of partially labeled data for each other. Besides, we propose a novel consistent regularization term called co-mix to avoid sample waste and tiny noise caused by false selection. In this way, CroSel can pick out the true labels of most examples with high precision. Extensive experiments demonstrate the superiority of CroSel, which consistently outperforms previous state-of-the-art methods on benchmark datasets. Additionally, our method achieves over 90\% accuracy and quantity for selecting true labels on CIFAR-type datasets under various settings.
翻译:部分标签学习(PLL)是一种重要的弱监督学习问题,其允许每个训练样本拥有候选标签集而非单一的真实标签。为解决PLL中的标签歧义问题,基于识别的方法被广泛探索,这些方法将真实标签视为待识别的潜在变量。然而,准确且完整地识别真实标签仍具挑战性,导致模型训练过程中伪标签存在噪声。本文提出一种名为CroSel的新方法,该方法利用模型的历史预测信息为大部分训练样本识别真实标签。首先,我们引入交叉选择策略,使两个深度模型能够相互为对方的局部标注数据选择真实标签。此外,我们提出一种名为co-mix的新型一致性正则化项,以避免由错误选择导致的样本浪费和微小噪声。通过这种方式,CroSel能够以高精度从大部分样本中筛选出真实标签。大量实验证明了CroSel的优越性,其在基准数据集上持续优于以往最先进方法。此外,在CIFAR类数据集上的多种设置下,我们的方法在选择真实标签时的准确率和数量均超过90%。