Positive-Unlabeled (PU) learning aims to train a binary classifier (positive vs. negative) where only limited positive data and abundant unlabeled data are available. While widely applicable, state-of-the-art PU learning methods substantially underperform their supervised counterparts on complex datasets, especially without auxiliary negatives or pre-estimated parameters (e.g., a 14.26% gap on CIFAR-100 dataset). We identify the primary bottleneck as the challenge of learning discriminative representations under unreliable supervision. To tackle this challenge, we propose NcPU, a non-contrastive PU learning framework that requires no auxiliary information. NcPU combines a noisy-pair robust supervised non-contrastive loss (NoiSNCL), which aligns intra-class representations despite unreliable supervision, with a phantom label disambiguation (PLD) scheme that supplies conservative negative supervision via regret-based label updates. Theoretically, NoiSNCL and PLD can iteratively benefit each other from the perspective of the Expectation-Maximization framework. Empirically, extensive experiments demonstrate that: (1) NoiSNCL enables simple PU methods to achieve competitive performance; and (2) NcPU achieves substantial improvements over state-of-the-art PU methods across diverse datasets, including challenging datasets on post-disaster building damage mapping, highlighting its promise for real-world applications. Code: Code will be open-sourced after review.
翻译:正样本-无标签学习旨在训练一个二分类器(正类 vs. 负类),其中仅可获得有限的正样本数据和丰富的无标签数据。尽管应用广泛,当前最先进的正样本-无标签学习方法在复杂数据集上显著落后于其有监督的对应方法,尤其是在没有辅助负样本或预估计参数的情况下(例如在CIFAR-100数据集上存在14.26%的差距)。我们确定其主要瓶颈在于不可靠监督下学习判别性表示的挑战。为应对这一挑战,我们提出了NcPU,一个无需辅助信息的非对比正样本-无标签学习框架。NcPU结合了一个噪声对鲁棒的有监督非对比损失(NoiSNCL),该损失能够在不可靠监督下对齐类内表示;以及一个幻影标签消歧方案,该方案通过基于遗憾的标签更新提供保守的负监督。理论上,从期望最大化框架的角度看,NoiSNCL和PLD可以迭代地相互受益。实证上,大量实验表明:(1)NoiSNCL使简单的正样本-无标签方法能够达到有竞争力的性能;(2)NcPU在包括灾后建筑损毁测绘等具有挑战性的数据集在内的多种数据集上,相比最先进的正样本-无标签方法实现了显著提升,突显了其在现实世界应用中的潜力。代码:评审后将开源。