Estimating the causal effect of time-varying treatments on survival outcomes in large observational studies is computationally demanding, particularly when outcomes are rare. While g-formula-based methods such as the iterative conditional expectation (ICE) estimator provide a principled framework for longitudinal causal inference, they become computationally expensive, especially when bootstrap-based variance estimation is required. In addition, outcome rarity at each time point induces severe class imbalance, leading to instability and convergence issues in logistic regression and related models. To address these challenges, we propose a principled subsampling and reweighting strategy for longitudinal survival data that can be applied to a range of existing causal effect estimators in this setting, including the ICE estimator. The proposed method substantially reduces computational burden while preserving consistency and improving estimation stability in rare-outcome settings. We evaluate the method through simulations and validate it using a large-scale EHR cohort study on social and behavioral determinants of health (SBDH) and suicide risk, demonstrating its effectiveness for modeling rare outcomes in longitudinal data.
翻译:在大规模观察性研究中,估计随时间变化的治疗对生存结局的因果效应在计算上要求很高,尤其是在结局罕见的情况下。虽然基于g公式的方法(如迭代条件期望估计量)为纵向因果推断提供了原则性框架,但这些方法在计算上成本高昂,特别是在需要基于自助法的方差估计时。此外,每个时间点结局的罕见性会导致严重的类别不平衡,从而在逻辑回归及相关模型中引发不稳定性和收敛问题。为应对这些挑战,我们提出了一种针对纵向生存数据的原则性子采样与重加权策略,可应用于该场景下的一系列现有因果效应估计方法(包括迭代条件期望估计量)。所提方法显著降低了计算负担,同时在罕见结局场景下保持了一致性并提升了估计稳定性。我们通过模拟研究评估了该方法,并利用一项关于健康社会与行为决定因素及自杀风险的大规模电子健康记录队列研究对其进行了验证,证明了其在纵向数据中建模罕见结局的有效性。