It is often said that the fundamental problem of causal inference is a missing data problem -- the comparison of responses to two hypothetical treatment assignments is made difficult because for every experimental unit only one potential response is observed. In this paper, we consider the implications of the converse view: that missing data problems are a form of causal inference. We make explicit how the missing data problem of recovering the complete data law from the observed law can be viewed as identification of a joint distribution over counterfactual variables corresponding to values had we (possibly contrary to fact) been able to observe them. Drawing analogies with causal inference, we show how identification assumptions in missing data can be encoded in terms of graphical models defined over counterfactual and observed variables. We review recent results in missing data identification from this viewpoint. In doing so, we note interesting similarities and differences between missing data and causal identification theories.
翻译:常言道,因果推断的根本问题是一个缺失数据问题——由于每个实验单元仅能观测到一个潜在响应,因此比较两种假设治疗分配下的响应变得困难。本文探讨了相反观点的内涵:缺失数据问题是因果推断的一种形式。我们明确阐述了如何将从未观测分布中恢复完整数据分布的缺失数据问题,视为对一组反事实变量联合分布的识别问题——这些变量对应着在(可能与事实相反)我们能够观测到它们的情况下所应取的值。通过类比因果推断,我们展示了如何用定义在反事实变量与观测变量上的图模型来编码缺失数据中的识别假设。基于这一视角,我们回顾了近期在缺失数据识别方面的研究成果。在此过程中,我们注意到缺失数据理论与因果识别理论之间存在有趣的相似性与差异性。