In empirical studies with time-to-event outcomes, investigators often leverage observational data to conduct causal inference on the effect of exposure when randomized controlled trial data is unavailable. Model misspecification and lack of overlap are common issues in observational studies, and they often lead to inconsistent and inefficient estimators of the average treatment effect. Estimators targeting overlap weighted effects have been proposed to address the challenge of poor overlap, and methods enabling flexible machine learning for nuisance models address model misspecification. However, the approaches that allow machine learning for nuisance models have not been extended to the setting of weighted average treatment effects for time-to-event outcomes when there is poor overlap. In this work, we propose a class of one-step cross-fitted double/debiased machine learning estimators for the weighted cumulative causal effect as a function of restriction time. We prove that the proposed estimators are consistent, asymptotically linear, and reach semiparametric efficiency bounds under regularity conditions. Our simulations show that the proposed estimators using nonparametric machine learning nuisance models perform as well as established methods that require correctly-specified parametric nuisance models, illustrating that our estimators mitigate the need for oracle parametric nuisance models. We apply the proposed methods to real-world observational data from a UK primary care database to compare the effects of anti-diabetic drugs on cancer clinical outcomes.
翻译:在具有时间至事件结局的实证研究中,研究者常利用观察性数据进行暴露效应的因果推断,以弥补随机对照试验数据的缺失。模型误设与重叠性不足是观察性研究中常见的问题,常导致平均处理效应的估计量不一致且低效。针对重叠性不足的挑战,已有研究提出针对重叠加权效应的估计量,并通过引入灵活机器学习方法处理干扰模型来应对模型误设问题。然而,允许机器学习建模干扰参数的方法尚未推广至时间至事件结局且存在重叠性不足场景下的加权平均处理效应设定。本研究提出一类基于单次交叉拟合的双/去偏置机器学习估计量,用于估计受限时间函数形式的加权累积因果效应。在正则条件下,我们证明所提估计量具有一致性、渐近线性性,并达到半参数效率界。模拟研究表明,采用非参数机器学习干扰模型的估计量表现与依赖正确设定参数化干扰模型的既有方法相当,说明本方法可降低对先知参数化干扰模型的需求。我们应用所提方法于英国初级保健数据库的真实世界观察性数据,比较抗糖尿病药物对癌症临床结局的影响。