In the realm of machine learning and statistical modeling, practitioners often work under the assumption of accessible, static, labeled data for evaluation and training. However, this assumption often deviates from reality where data may be private, encrypted, difficult- to-measure, or unlabeled. In this paper, we bridge this gap by adapting the Hui-Walter paradigm, a method traditionally applied in epidemiology and medicine, to the field of machine learning. This approach enables us to estimate key performance metrics such as false positive rate, false negative rate, and priors in scenarios where no ground truth is available. We further extend this paradigm for handling online data, opening up new possibilities for dynamic data environments. Our methodology involves partitioning data into latent classes to simulate multiple data populations (if natural populations are unavailable) and independently training models to replicate multiple tests. By cross-tabulating binary outcomes across ensemble categorizers and multiple populations, we are able to estimate unknown parameters through Gibbs sampling, eliminating the need for ground-truth or labeled data. This paper showcases the potential of our methodology to transform machine learning practices by allowing for accurate model assessment under dynamic and uncertain data conditions.
翻译:在机器学习和统计建模领域,从业者通常假设能够获取静态、带标签的数据用于评估和训练。然而,这一假设常与现实相悖——数据可能是私密的、加密的、难以测量的或未经标注的。本文通过将传统应用于流行病学和医学的Hui-Walter范式适配至机器学习领域,弥合了这一差距。该方法使我们能够在无真实标签可用时,估计假阳性率、假阴性率及先验概率等关键性能指标。我们进一步将该范式扩展至在线数据处理,为动态数据环境开辟了新可能。具体方法包括:将数据划分为潜在类别以模拟多数据群体(若自然群体不可得),并独立训练模型以复现多次测试。通过跨集成分类器与多群体对二元结果进行交叉制表,我们得以利用吉布斯采样估计未知参数,无需依赖真实标签或已标注数据。本文展示了该方法在动态与不确定数据条件下实现精确模型评估的潜力,有望变革机器学习实践范式。