Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.
翻译:弱监督学习通常面临两大挑战:一是现有算法难以适应具有多样化弱监督形式的各类场景;二是算法复杂度高导致可扩展性受限,从而阻碍了实际应用部署。本文提出一种新颖的弱监督学习通用框架(GLWS)及其配套算法。该框架的核心是基于期望最大化(EM)的数学表述,能够灵活兼容多种弱监督源,包括实例部分标签、聚合统计量、成对观测数据以及未标注数据。我们进一步提出一种先进算法,通过引入非确定性有限自动机(NFA)结合前向-后向算法,显著简化了EM过程的计算需求,将现有解决方案通常所需的二次或阶乘时间复杂度有效降低至线性级别。由此,从任意弱监督中学习的问题转化为对弱监督的NFA建模问题。GLWS不仅提升了机器学习模型的可扩展性,更在11种弱监督场景中展现出卓越的性能与泛化能力。我们希望这项工作能为该领域的进一步发展和实际应用部署开辟新的道路。