Weakly supervised learning generally faces challenges in applicability to various scenarios with diverse weak supervision and in scalability due to the complexity of existing algorithms, thereby hindering the practical deployment. This paper introduces a general framework for learning from weak supervision (GLWS) with a novel algorithm. Central to GLWS is an Expectation-Maximization (EM) formulation, adeptly accommodating various weak supervision sources, including instance partial labels, aggregate statistics, pairwise observations, and unlabeled data. We further present an advanced algorithm that significantly simplifies the EM computational demands using a Non-deterministic Finite Automaton (NFA) along with a forward-backward algorithm, which effectively reduces time complexity from quadratic or factorial often required in existing solutions to linear scale. The problem of learning from arbitrary weak supervision is therefore converted to the NFA modeling of them. GLWS not only enhances the scalability of machine learning models but also demonstrates superior performance and versatility across 11 weak supervision scenarios. We hope our work paves the way for further advancements and practical deployment in this field.
翻译:弱监督学习通常面临两大挑战:一是难以适应具有多样化弱监督信号的各种场景,二是由于现有算法复杂度较高而导致可扩展性受限,从而阻碍了实际应用部署。本文提出了一种通用的弱监督学习框架(GLWS)及其新颖算法。GLFS的核心是一个期望最大化(EM)建模形式,能够灵活兼容多种弱监督来源,包括实例部分标签、聚合统计量、成对观测数据以及未标注数据。我们进一步提出一种先进算法,通过结合非确定性有限自动机(NFA)与前向-后向算法,显著简化了EM计算需求,将现有解决方案通常所需的二次或阶乘时间复杂度有效降低至线性级别。由此,从任意弱监督中学习的问题转化为对它们的NFA建模。GLWS不仅提升了机器学习模型的可扩展性,更在11种弱监督场景中展现出卓越的性能与通用性。我们希望本研究能为该领域的进一步发展和实际应用部署开辟道路。