The use of massive survival data has become common in survival analysis. In this study, a subsampling algorithm is proposed for the Cox proportional hazards model with time-dependent covariates when the sample is extraordinarily large but computing resources are relatively limited. A subsample estimator is developed by maximizing the weighted partial likelihood; it is shown to have consistency and asymptotic normality. By minimizing the asymptotic mean squared error of the subsample estimator, the optimal subsampling probabilities are formulated with explicit expressions. Simulation studies show that the proposed method can satisfactorily approximate the estimator of the full dataset. The proposed method is then applied to corporate loan and breast cancer datasets, with different censoring rates, and the outcomes confirm its practical advantages.
翻译:大规模生存数据在生存分析中的应用已日益普遍。针对样本量极大而计算资源相对有限的情形,本研究提出了一种适用于含时变协变量的Cox比例风险模型的子抽样算法。通过最大化加权部分似然函数,构建了子抽样估计量,并证明了其相合性与渐近正态性。通过最小化子抽样估计量的渐近均方误差,推导出具有显式表达式的最优子抽样概率。模拟研究表明,所提方法能够较好地逼近全数据集的估计结果。将该方法应用于不同删失率的企业贷款和乳腺癌数据集,实验结果验证了其实际应用优势。