Distributed statistical analyses provide a promising approach for privacy protection when analysing data distributed over several databases. It brings the analysis to the data and not the data to the analysis. The analyst receives anonymous summary statistics which are combined to a aggregated result. We are interested to calculate the AUC of a prediction score based on a distributed approach without getting to know the data of involved individual subjects distributed over different databases. We use DataSHIELD as the technology to carry out distributed analyses and use a newly developed algorithms to perform the validation of the prediction score. Calibration can easily be implemented in the distributed setting. But, discrimination represented by a respective ROC curve and its AUC is challenging. We base our approach on the ROC-GLM algorithm as well as on ideas of differential privacy. The proposed algorithms are evaluated in a simulation study. A real-word application is described: The audit use case of DIFUTURE (Medical Informatics Initiative) with the goal to validate a treatment prediction rule of patients with newly diagnosed multiple sclerosis.
翻译:分布式统计分析为分析分布在多个数据库中的数据提供了一种有前景的隐私保护方法。它将分析过程传递给数据,而非将数据集中到分析环节。分析人员接收匿名的汇总统计量,并将其整合为聚合结果。我们关注于基于分布式方法计算预测评分的AUC(曲线下面积),而无需获知分布在各个数据库中相关个体受试者的具体数据。我们采用DataSHIELD作为执行分布式分析的技术平台,并运用新开发的算法对预测评分进行验证。校准过程在分布式环境下易于实现,但通过相应ROC曲线及其AUC所表征的区分度则具有挑战性。我们的方法基于ROC-GLM算法及差分隐私思想。通过模拟研究对所提出的算法进行了评估。此外还描述了实际应用案例:DIFUTURE(医学信息学倡议)的审计用例,其目标是对新确诊多发性硬化症患者的治疗预测规则进行验证。