Understanding and quantifying cause and effect is an important problem in many domains. The generally-agreed solution to this problem is to perform a randomised controlled trial. However, even when randomised controlled trials can be performed, they usually have relatively short duration's due to cost considerations. This makes learning long-term causal effects a very challenging task in practice, since the long-term outcome is only observed after a long delay. In this paper, we study the identification and estimation of long-term treatment effects when both experimental and observational data are available. Previous work provided an estimation strategy to determine long-term causal effects from such data regimes. However, this strategy only works if one assumes there are no unobserved confounders in the observational data. In this paper, we specifically address the challenging case where unmeasured confounders are present in the observational data. Our long-term causal effect estimator is obtained by combining regression residuals with short-term experimental outcomes in a specific manner to create an instrumental variable, which is then used to quantify the long-term causal effect through instrumental variable regression. We prove this estimator is unbiased, and analytically study its variance. In the context of the front-door causal structure, this provides a new causal estimator, which may be of independent interest. Finally, we empirically test our approach on synthetic-data, as well as real-data from the International Stroke Trial.
翻译:理解并量化因果关系是许多领域中的重要问题。公认的解决方案是进行随机对照试验。然而,即使可以实施随机对照试验,由于成本考虑,其持续时间通常较短。这使得在实践中学习长期因果效应成为一项极具挑战性的任务,因为长期结果需要经过很长时间才能被观测到。在本文中,我们研究了当同时存在实验数据和观测数据时,长期治疗效应的识别与估计问题。先前的工作提供了一种基于此类数据确定长期因果效应的估计策略。然而,该策略仅适用于假设观测数据中不存在未观测混杂因素的情况。在本文中,我们专门处理观测数据中存在未测量混杂因素的具有挑战性的情况。我们的长期因果效应估计量通过将回归残差与短期实验结果以特定方式结合来创建工具变量,进而利用工具变量回归量化长期因果效应。我们证明了该估计量是无偏的,并对其方差进行了分析性研究。在前门因果结构的背景下,这提供了一种新的因果估计量,可能具有独立的研究价值。最后,我们在合成数据以及来自国际卒中试验的真实数据上对我们的方法进行了实证检验。