Semi-supervised learning and weakly supervised learning are important paradigms that aim to reduce the growing demand for labeled data in current machine learning applications. In this paper, we introduce a novel analysis of the classical label propagation algorithm (LPA) (Zhu & Ghahramani, 2002) that moreover takes advantage of useful prior information, specifically probabilistic hypothesized labels on the unlabeled data. We provide an error bound that exploits both the local geometric properties of the underlying graph and the quality of the prior information. We also propose a framework to incorporate multiple sources of noisy information. In particular, we consider the setting of weak supervision, where our sources of information are weak labelers. We demonstrate the ability of our approach on multiple benchmark weakly supervised classification tasks, showing improvements upon existing semi-supervised and weakly supervised methods.
翻译:半监督学习和弱监督学习是重要的学习范式,旨在减少当前机器学习应用中对标注数据日益增长的需求。本文对经典的标签传播算法(LPA)(Zhu & Ghahramani, 2002)提出了一种新颖的分析方法,该方法能够利用有价值的先验信息,具体指未标注数据上的概率性假设标签。我们给出了一个误差界,该误差界同时利用了底层图的局部几何特性以及先验信息的质量。我们还提出了一个框架来整合来自多个来源的噪声信息。特别地,我们考虑了弱监督的设置,其中信息源是弱标注器。我们在多个基准弱监督分类任务上展示了我们方法的能力,表明其相较于现有半监督和弱监督方法有所改进。