G-formula is a popular approach for estimating treatment or exposure effects from longitudinal data that are subject to time-varying confounding. G-formula estimation is typically performed by Monte-Carlo simulation, with non-parametric bootstrapping used for inference. We show that G-formula can be implemented by exploiting existing methods for multiple imputation (MI) for synthetic data. This involves using an existing modified version of Rubin's variance estimator. In practice missing data is ubiquitous in longitudinal datasets. We show that such missing data can be readily accommodated as part of the MI procedure when using G-formula, and describe how MI software can be used to implement the approach. We explore its performance using a simulation study and an application from cystic fibrosis.
翻译:G-formula是一种从纵向数据中估计治疗或暴露效应的常用方法,该方法适用于存在时变混杂的情况。G-formula估计通常通过蒙特卡罗模拟实现,并使用非参数自助法进行推断。我们证明,利用现有的多重插补(MI)方法可以实施G-formula来生成合成数据,这涉及使用鲁宾方差估计量的现有修正版本。在实际纵向数据集中,缺失数据普遍存在。我们表明,在使用G-formula时,此类缺失数据可以通过MI过程中的相应处理轻松纳入,并描述了如何利用MI软件来实现该方法的。我们通过模拟研究和囊性纤维化应用案例探索了其性能表现。