The scarcity of class-labeled data is a ubiquitous bottleneck in many machine learning problems. While abundant unlabeled data typically exist and provide a potential solution, it is highly challenging to exploit them. In this paper, we address this problem by leveraging Positive-Unlabeled~(PU) classification and the conditional generation with extra unlabeled data \emph{simultaneously}. In particular, we present a novel training framework to jointly target both PU classification and conditional generation when exposed to extra data, especially out-of-distribution unlabeled data, by exploring the interplay between them: 1) enhancing the performance of PU classifiers with the assistance of a novel Classifier-Noise-Invariant Conditional GAN~(CNI-CGAN) that is robust to noisy labels, 2) leveraging extra data with predicted labels from a PU classifier to help the generation. Theoretically, we prove the optimal condition of CNI-CGAN, and experimentally, we conducted extensive evaluations on diverse datasets, verifying the simultaneous improvements in both classification and generation.
翻译:类标记数据的匮乏是许多机器学习问题中的普遍瓶颈。尽管通常存在大量无标记数据并提供潜在解决方案,但如何有效利用这些数据极具挑战性。本文通过同时利用正-无标记(PU)分类与额外无标记数据的条件生成来应对这一问题。具体而言,我们提出了一种新颖的训练框架,在引入额外数据(尤其是分布外无标记数据)时,通过探索两者间的相互作用,同时针对PU分类与条件生成进行优化:1)借助一种对噪声标签鲁棒的新型分类器噪声不变条件生成对抗网络(CNI-CGAN)提升PU分类器性能;2)利用PU分类器预测标签的额外数据辅助生成。理论上我们证明了CNI-CGAN的最优条件,实验上在多种数据集上进行了广泛评估,验证了分类与生成性能的同时提升。