To ensure privacy protection and alleviate computational burden, we propose a fast subsmaling procedure for the Cox model with massive survival datasets from multi-centered, decentralized sources. The proposed estimator is computed based on optimal subsampling probabilities that we derived and enables transmission of subsample-based summary level statistics between different storage sites with only one round of communication. For inference, the asymptotic properties of the proposed estimator were rigorously established. An extensive simulation study demonstrated that the proposed approach is effective. The methodology was applied to analyze a large dataset from the U.S. airlines.
翻译:为确保隐私保护并减轻计算负担,本文针对来自多中心、分布式来源的大规模生存数据集,提出了一种Cox模型的快速子抽样方法。所提出的估计量基于我们推导出的最优子抽样概率进行计算,并支持在不同存储站点间仅通过一轮通信传输基于子样本的汇总统计量。在统计推断方面,我们严格建立了所提估计量的渐近性质。大量模拟研究表明该方法是有效的。该方法已应用于分析美国航空公司的海量数据集。