To ensure privacy protection and alleviate computational burden, we propose a Poisson-subsampling based distributed estimation procedure for the Cox model with massive survival datasets from multi-centered, decentralized sources. The proposed estimator is computed based on optimal subsampling probabilities that we derived and enables transmission of subsample-based summary level statistics between different storage sites with only one round of communication. For inference, the asymptotic properties of the proposed estimator were rigorously established. An extensive simulation study demonstrated that the proposed approach is effective. The methodology was applied to analyze a large dataset from the U.S. airlines.
翻译:为保障隐私保护并缓解计算负担,我们提出了一种基于泊松子采样的分布式估计方法,适用于来自多中心、分散化数据源的庞大生存数据集。该方法基于推导出的最优子采样概率计算估计量,仅需一轮通信即可在存储站点间传输基于子样本的汇总级统计量。在推断层面,我们严格建立了所提估计量的渐近性质。广泛模拟研究证实了该方法的有效性,并将其应用于美国航空公司大数据集的分析。