Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels, utilizing weak labels from heuristics, crowdsourcing, or pre-trained models. However, the absence of ground truth complicates model evaluation, as traditional metrics such as accuracy, precision, and recall cannot be directly calculated. In this work, we present a novel method to address this challenge by framing model evaluation as a partial identification problem and estimating performance bounds using Fr\'echet bounds. Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques. Through scalable convex optimization, we obtain accurate and computationally efficient bounds for metrics including accuracy, precision, recall, and F1-score, even in high-dimensional settings. This framework offers a robust approach to assessing model quality without ground truth labels, enhancing the practicality of weakly supervised learning for real-world applications.
翻译:程序化弱监督(PWS)通过利用启发式规则、众包或预训练模型生成的弱标签,能够在无需直接获取真实标注的情况下训练监督模型。然而,由于缺乏真实标注,传统评估指标(如准确率、精确率和召回率)无法直接计算,这使得模型评估变得复杂。本研究提出一种创新方法,将模型评估构建为偏识别问题,并利用弗雷歇边界估计性能边界,从而应对这一挑战。该方法无需标注数据即可推导关键指标的可靠边界,克服了当前弱监督评估技术的核心局限。通过可扩展的凸优化方法,我们能够为准确率、精确率、召回率和F1分数等指标获得精确且计算高效的边界,即使在高维场景下亦适用。该框架为无真实标注条件下的模型质量评估提供了稳健方法,增强了弱监督学习在实际应用中的实用性。