Positive and Unlabeled (PU) learning, a binary classification model trained with only positive and unlabeled data, generally suffers from overfitted risk estimation due to inconsistent data distributions. To address this, we introduce a pseudo-supervised PU learning framework (PSPU), in which we train the PU model first, use it to gather confident samples for the pseudo supervision, and then apply these supervision to correct the PU model's weights by leveraging non-PU objectives. We also incorporate an additional consistency loss to mitigate noisy sample effects. Our PSPU outperforms recent PU learning methods significantly on MNIST, CIFAR-10, CIFAR-100 in both balanced and imbalanced settings, and enjoys competitive performance on MVTecAD for industrial anomaly detection.
翻译:正例与无标签(PU)学习是一种仅使用正例和无标签数据进行训练的二分类模型,通常因数据分布不一致而面临风险估计过拟合的问题。为解决此问题,我们提出了一种伪监督PU学习框架(PSPU)。该框架首先训练PU模型,利用该模型收集高置信度样本作为伪监督信号,随后借助非PU目标函数,应用这些监督信号修正PU模型的权重。我们还引入了一致性损失以缓解噪声样本的影响。在MNIST、CIFAR-10和CIFAR-100数据集上,无论数据分布平衡与否,我们的PSPU方法均显著优于近期PU学习方法,并在工业异常检测数据集MVTecAD上展现出具有竞争力的性能。