The case-cohort design obtains complete covariate data only on cases and on a random sample (the subcohort) of the entire cohort. Subsequent publications described the use of stratification and weight calibration to increase efficiency of estimates of Cox model log relative hazards, and there has been some work estimating pure risk. Yet there are few examples of these options in the medical literature, and we could not find programs currently online to analyze these various options. We therefore present a unified approach and R software to facilitate such analyses. We used influence functions adapted to the various design and analysis options together with variance calculations that take the two-phase sampling into account. This work clarifies when the widely used "robust" variance estimate of Barlow is appropriate. The corresponding R software, CaseCohortCoxSurvival, facilitates analysis with and without stratification and/or weight calibration, for subcohort sampling with or without replacement. We also allow for phase-two data to be missing at random for stratified designs. We provide inference not only for log relative hazards in the Cox model, but also for cumulative baseline hazards and covariate-specific pure risks. We hope these calculations and software will promote wider use of more efficient and principled design and analysis options for case-cohort studies.
翻译:病例队列设计仅对病例和整个队列的随机样本(子队列)收集完整的协变量数据。后续文献描述了使用分层和权重校准来提高Cox模型对数相对风险估计效率的方法,并且已有部分研究对纯风险进行了估计。然而,医学文献中这些选项的实例较少,且目前难以从线上找到分析这些不同选项的现成程序。因此,我们提出了一种统一的方法和R软件以促进此类分析。我们采用针对不同设计和分析选项调整的影响函数,并结合考虑两阶段抽样的方差计算方法。这项工作阐明了Barlow广泛使用的"稳健"方差估计的适用条件。对应的R软件CaseCohortCoxSurvival支持在有/无分层和/或权重校准的情况下,对有/无放回子队列抽样进行分析。我们还考虑了分层设计中第二阶段数据随机缺失的情况。我们不仅提供Cox模型中对数相对风险的推断,还提供累积基线风险函数和协变量特异性纯风险的推断。希望这些计算和软件能够推动病例队列研究更广泛地采用更高效、更规范的设计与分析选项。