Kaplan-Meier estimators are essential tools in survival analysis, capturing the survival behavior of a cohort. Their accuracy improves with large, diverse datasets, encouraging data holders to collaborate for more precise estimations. However, these datasets often contain sensitive individual information, necessitating stringent data protection measures that preclude naive data sharing. In this work, we introduce two novel differentially private methods that offer flexibility in applying differential privacy to various functions of the data. Additionally, we propose a synthetic dataset generation technique that enables easy and rapid conversion between different data representations. Utilizing these methods, we propose various paths that allow a joint estimation of the Kaplan-Meier curves with strict privacy guarantees. Our contribution includes a taxonomy of methods for this task and an extensive experimental exploration and evaluation based on this structure. We demonstrate that our approach can construct a joint, global Kaplan-Meier estimator that adheres to strict privacy standards ($\varepsilon = 1$) while exhibiting no statistically significant deviation from the nonprivate centralized estimator.
翻译:Kaplan-Meier估计器是生存分析中的核心工具,用于刻画队列的生存行为。其准确性随着大规模、多样化数据集的增加而提升,这促使数据持有者通过协作以获得更精确的估计。然而,这些数据集通常包含敏感的个体信息,需要严格的数据保护措施,从而排除了直接共享数据的可能性。本研究提出了两种新颖的差分隐私方法,能够灵活地将差分隐私应用于数据的各类函数。此外,我们提出了一种合成数据集生成技术,可实现不同数据表示之间简便快速的转换。利用这些方法,我们设计了多种路径,使得在严格隐私保证下联合估计Kaplan-Meier曲线成为可能。我们的贡献包括:为此任务构建了方法分类体系,并基于该结构进行了广泛的实验探索与评估。实验结果表明,我们的方法能够构建一个联合的全局Kaplan-Meier估计器,在遵循严格隐私标准($\varepsilon = 1$)的同时,与无隐私保护的集中式估计器相比未表现出统计学上的显著偏差。