In a typical two-phase design, a random sample is drawn from the target population in phase 1, during which only a subset of variables is collected. In phase 2, a subsample of the phase-1 cohort is selected, and additional variables are measured. This setting induces a coarsened data structure on the data from the second phase. We assume coarsening at random, that is, the phase-2 sampling mechanism depends only on variables fully observed. We review existing estimators, including the generalized raking estimator and the inverse probability of censoring weighted targeted maximum likelihood estimation (IPCW-TMLE) along with its extensions that also target the phase-2 sampling mechanism to improve efficiency. We further introduce a new class of estimators constructed within the TMLE framework that are asymptotically equivalent.
翻译:在典型的两阶段设计中,第一阶段从目标总体中抽取随机样本,此阶段仅收集部分变量。第二阶段从第一阶段队列中选择子样本,并测量额外变量。这种设置导致第二阶段数据具有粗化数据结构。我们假设粗化随机发生,即第二阶段抽样机制仅依赖于完全观测的变量。我们回顾现有估计器,包括广义回归估计器和逆删失概率加权目标最大似然估计(IPCW-TMLE)及其扩展方法——这些扩展同样针对第二阶段抽样机制以提高估计效率。我们进一步提出在TMLE框架内构建的新估计器类别,该类估计器具有渐近等价性。