Weakly supervised learning aims to empower machine learning when the perfect supervision is unavailable, which has drawn great attention from researchers. Among various types of weak supervision, one of the most challenging cases is to learn from multiple unlabeled (U) datasets with only a little knowledge of the class priors, or U$^m$ learning for short. In this paper, we study the problem of building an AUC (area under ROC curve) optimization model from multiple unlabeled datasets, which maximizes the pairwise ranking ability of the classifier. We propose U$^m$-AUC, an AUC optimization approach that converts the U$^m$ data into a multi-label AUC optimization problem, and can be trained efficiently. We show that the proposed U$^m$-AUC is effective theoretically and empirically.
翻译:弱监督学习旨在实现完美监督不可获取情况下的机器学习,这引起了研究者的广泛关注。在各种弱监督类型中,最具挑战性的案例之一是基于仅有少量类别先验知识的多无标签数据集进行学习(简称U$^m$学习)。本文研究如何从多个无标签数据集中构建AUC(ROC曲线下面积)优化模型,以最大化分类器的成对排序能力。我们提出U$^m$-AUC方法,这是一种将U$^m$数据转化为多标签AUC优化问题的高效训练方法。理论分析与实验验证均表明,所提出的U$^m$-AUC方法具有有效性。