Semi-supervised learning and weakly supervised learning are important paradigms that aim to reduce the growing demand for labeled data in current machine learning applications. In this paper, we introduce a novel analysis of the classical label propagation algorithm (LPA) (Zhu & Ghahramani, 2002) that moreover takes advantage of useful prior information, specifically probabilistic hypothesized labels on the unlabeled data. We provide an error bound that exploits both the local geometric properties of the underlying graph and the quality of the prior information. We also propose a framework to incorporate multiple sources of noisy information. In particular, we consider the setting of weak supervision, where our sources of information are weak labelers. We demonstrate the ability of our approach on multiple benchmark weakly supervised classification tasks, showing improvements upon existing semi-supervised and weakly supervised methods.
翻译:半监督学习和弱监督学习是当前机器学习应用中旨在减少对标注数据日益增长需求的重要范式。本文对经典标签传播算法(LPA)(Zhu & Ghahramani, 2002)提出了一种新颖的分析方法,该方法同时利用了有用的先验信息,特别是关于未标注数据的概率性假设标签。我们推导出一个误差界,该误差界同时利用了底层图的局部几何特性与先验信息质量。此外,我们提出一个融合多源噪声信息的框架。特别地,我们考虑弱监督场景,其中信息源为弱标注器。在多个基准弱监督分类任务上,我们展示了该方法的能力,显著改进了现有半监督与弱监督方法的表现。