We develop methods for estimating Fr\'echet bounds on (possibly high-dimensional) distribution classes in which some variables are continuous-valued. We establish the statistical correctness of the computed bounds under uncertainty in the marginal constraints and demonstrate the usefulness of our algorithms by evaluating the performance of machine learning (ML) models trained with programmatic weak supervision (PWS). PWS is a framework for principled learning from weak supervision inputs (e.g., crowdsourced labels, knowledge bases, pre-trained models on related tasks, etc), and it has achieved remarkable success in many areas of science and engineering. Unfortunately, it is generally difficult to validate the performance of ML models trained with PWS due to the absence of labeled data. Our algorithms address this issue by estimating sharp lower and upper bounds for performance metrics such as accuracy/recall/precision/F1 score.
翻译:我们开发了在(可能高维)分布类中估计Fréchet界的方法,其中某些变量是连续值的。我们在边际约束不确定性的条件下证明了所计算界的统计正确性,并通过评估使用程序化弱监督(PWS)训练的机器学习(ML)模型的性能,展示了我们算法的实用性。PWS是一个从弱监督输入(例如众包标签、知识库、相关任务的预训练模型等)中进行原则性学习的框架,已在多个科学和工程领域取得显著成功。然而,由于缺乏标注数据,通常难以验证使用PWS训练的ML模型的性能。我们的算法通过为准确率/召回率/精确率/F1分数等性能指标估计严格的上下界来解决这一问题。