Kaplan-Meier estimators capture the survival behavior of a cohort. They are one of the key statistics in survival analysis. As with any estimator, they become more accurate in presence of larger datasets. This motivates multiple data holders to share their data in order to calculate a more accurate Kaplan-Meier estimator. However, these survival datasets often contain sensitive information of individuals and it is the responsibility of the data holders to protect their data, thus a naive sharing of data is often not viable. In this work, we propose two novel differentially private schemes that are facilitated by our novel synthetic dataset generation method. Based on these scheme we propose various paths that allow a joint estimation of the Kaplan-Meier curves with strict privacy guarantees. Our contribution includes a taxonomy of methods for this task and an extensive experimental exploration and evaluation based on this structure. We show that we can construct a joint, global Kaplan-Meier estimator which satisfies very tight privacy guarantees and with no statistically-significant utility loss compared to the non-private centralized setting.
翻译:Kaplan-Meier估计器用于捕获一组对象的生存行为,是生存分析中的关键统计量之一。与任何估计器一样,数据集越大,其准确性越高。这促使多个数据持有者共享数据,以计算更准确的Kaplan-Meier估计器。然而,这些生存数据集通常包含个人的敏感信息,数据持有者有责任保护这些数据,因此直接共享数据往往不可行。在本文中,我们提出两种新颖的差分隐私方案,这些方案基于我们创新的合成数据集生成方法。基于这些方案,我们设计了多条路径,以实现具有严格隐私保证的Kaplan-Meier曲线的联合估计。我们的贡献包括对此类任务的方法分类,以及基于此分类结构的广泛实验探索与评估。结果表明,我们能够构建一个联合的全局Kaplan-Meier估计器,该估计器满足非常严格的隐私保证,且与非私有的集中式设置相比,无统计上显著的效用损失。