The case-cohort design is a commonly used cost-effective sampling strategy for large cohort studies, where some covariates are expensive to measure or obtain. In this paper, we consider regression analysis under a case-cohort study with interval-censored failure time data, where the failure time is only known to fall within an interval instead of being exactly observed. A common approach to analyze data from a case-cohort study is the inverse probability weighting approach, where only subjects in the case-cohort sample are used in estimation, and the subjects are weighted based on the probability of inclusion into the case-cohort sample. This approach, though consistent, is generally inefficient as it does not incorporate information outside the case-cohort sample. To improve efficiency, we first develop a sieve maximum weighted likelihood estimator under the Cox model based on the case-cohort sample, and then propose a procedure to update this estimator by using information in the full cohort. We show that the update estimator is consistent, asymptotically normal, and more efficient than the original estimator. The proposed method can flexibly incorporate auxiliary variables to further improve estimation efficiency. We employ a weighted bootstrap procedure for variance estimation. Simulation results indicate that the proposed method works well in practical situations. A real study on diabetes is provided for illustration.
翻译:病例队列设计是大规模队列研究中常用的一种经济有效的抽样策略,适用于某些协变量测量或获取成本较高的情况。本文针对区间删失失效时间数据(失效时间仅知落在某个区间内而非精确观测)的病例队列研究,开展回归分析。分析此类数据的常用方法为逆概率加权法,仅使用病例队列样本中的个体进行估计,并根据各个体被纳入病例队列样本的概率进行加权。尽管该方法具有一致性,但由于未利用病例队列样本外的信息,通常效率较低。为提升效率,我们首先基于Cox模型和病例队列样本构建了筛网最大加权似然估计量,随后提出利用全队列信息更新该估计量的步骤。理论证明该更新估计量具有一致性、渐近正态性,且比原始估计量更高效。所提方法可灵活引入辅助变量以进一步提升估计效率。我们采用加权自助法进行方差估计。模拟结果表明,该方法在实际场景中表现良好。最后通过一项糖尿病真实研究进行示例说明。