We study the problem of estimating survival causal effects, where the aim is to characterize the impact of an intervention on survival times, i.e., how long it takes for an event to occur. Applications include determining if a drug reduces the time to ICU discharge or if an advertising campaign increases customer dwell time. Historically, the most popular estimates have been based on parametric or semiparametric (e.g. proportional hazards) models; however, these methods suffer from problematic levels of bias. Recently debiased machine learning approaches are becoming increasingly popular, especially in applications to large datasets. However, despite their appealing theoretical properties, these estimators tend to be unstable because the debiasing step involves the use of the inverses of small estimated probabilities -- small errors in the estimated probabilities can result in huge changes in their inverses and therefore the resulting estimator. This problem is exacerbated in survival settings where probabilities are a product of treatment assignment and censoring probabilities. We propose a covariate balancing approach to estimating these inverses directly, sidestepping this problem. The result is an estimator that is stable in practice and enjoys many of the same theoretical properties. In particular, under overlap and asymptotic equicontinuity conditions, our estimator is asymptotically normal with negligible bias and optimal variance. Our experiments on synthetic and semi-synthetic data demonstrate that our method has competitive bias and smaller variance than debiased machine learning approaches.
翻译:我们研究了生存因果效应的估计问题,旨在刻画干预对生存时间(即事件发生所需时间)的影响。应用场景包括:判断某种药物能否缩短重症监护病房出院时间,或评估广告活动能否延长客户停留时长。历史上最常用的估计方法基于参数或半参数模型(如比例风险模型),但这些方法存在严重的偏差问题。近年来,去偏机器学习方法日益流行,尤其在大型数据集的应用中。然而,尽管这些估计量具有吸引人的理论性质,但去偏步骤涉及使用小估计概率的倒数——估计概率的微小误差可能导致其倒数发生巨大变化,从而造成估计量不稳定。这一问题在生存分析中尤为突出,因为概率是处理分配概率和删失概率的乘积。我们提出了一种协变量平衡方法直接估计这些倒数,绕过了这一难题。由此得到的估计量在实用中保持稳定,并继承了诸多类似的理论性质。特别地,在重叠条件和渐近等度连续条件下,我们的估计量渐近正态,具有可忽略的偏差和最优方差。在合成数据和半合成数据上的实验表明,与去偏机器学习方法相比,我们的方法具有竞争性的偏差和更小的方差。