Computing an AUC as a performance measure to compare the quality of different machine learning models is one of the final steps of many research projects. Many of these methods are trained on privacy-sensitive data and there are several different approaches like $\epsilon$-differential privacy, federated machine learning and cryptography if the datasets cannot be shared or used jointly at one place for training and/or testing. In this setting, it can also be a problem to compute the global AUC, since the labels might also contain privacy-sensitive information. There have been approaches based on $\epsilon$-differential privacy to address this problem, but to the best of our knowledge, no exact privacy preserving solution has been introduced. In this paper, we propose an MPC-based solution, called ppAURORA, with private merging of individually sorted lists from multiple sources to compute the exact AUC as one could obtain on the pooled original test samples. With ppAURORA, the computation of the exact area under precision-recall and receiver operating characteristic curves is possible even when ties between prediction confidence values exist. We use ppAURORA to evaluate two different models predicting acute myeloid leukemia therapy response and heart disease, respectively. We also assess its scalability via synthetic data experiments. All these experiments show that we efficiently and privately compute the exact same AUC with both evaluation metrics as one can obtain on the pooled test samples in plaintext according to the semi-honest adversary setting.
翻译:计算AUC作为评估不同机器学习模型质量的性能指标,是许多研究项目的最终步骤之一。由于许多模型在隐私敏感数据上训练,且当数据集无法共享或集中用于训练和/或测试时,存在多种方法(如ε-差分隐私、联邦机器学习及密码学)。在此背景下,由于标签可能包含隐私敏感信息,计算全局AUC也存在问题。已有基于ε-差分隐私的方法试图解决该问题,但据我们所知,目前尚无精确的隐私保护解决方案。本文提出一种基于MPC的解决方案ppAURORA,通过私密合并来自多个源的独立排序列表,精确计算与在合并原始测试样本上获得的AUC完全一致的结果。使用ppAURORA,即使预测置信度值存在平局,也能精确计算精确率-召回率曲线和受试者工作特征曲线下面积。我们分别利用ppAURORA评估两种预测急性髓系白血病治疗反应和心脏疾病的模型,并通过合成数据实验验证其可扩展性。所有实验表明,在半诚实对手模型下,我们能高效且私密地计算出与明文合并测试样本完全相同的两种评估指标对应的AUC值。