One of the goals of causal inference is to generalize from past experiments and observational data to novel conditions. While it is in principle possible to eventually learn a mapping from a novel experimental condition to an outcome of interest, provided a sufficient variety of experiments is available in the training data, coping with a large combinatorial space of possible interventions is hard. Under a typical sparse experimental design, this mapping is ill-posed without relying on heavy regularization or prior distributions. Such assumptions may or may not be reliable, and can be hard to defend or test. In this paper, we take a close look at how to warrant a leap from past experiments to novel conditions based on minimal assumptions about the factorization of the distribution of the manipulated system, communicated in the well-understood language of factor graph models. A postulated $\textit{interventional factor model}$ (IFM) may not always be informative, but it conveniently abstracts away a need for explicitly modeling unmeasured confounding and feedback mechanisms, leading to directly testable claims. Given an IFM and datasets from a collection of experimental regimes, we derive conditions for identifiability of the expected outcomes of new regimes never observed in these training data. We implement our framework using several efficient algorithms, and apply them on a range of semi-synthetic experiments.
翻译:因果推断的目标之一是从过去的实验和观测数据泛化至新条件。原则上,若能提供足够多样的实验训练数据,最终可以学习从新实验条件到目标结果的映射,但应对干预可能出现的庞大组合空间十分困难。在典型的稀疏实验设计下,若不依赖强正则化或先验分布,该映射是一个病态问题。此类假设的可靠性不确定,且难以论证或检验。本文基于对受操纵系统分布因子分解的最小假设(以因子图模型这一成熟语言表达),深入探讨如何从过去实验跃迁至新条件。所提出的"干预因子模型"(IFM)虽未必总具信息量,但能巧妙抽象掉对未测量混杂因素和反馈机制的显式建模需求,从而得出可直接检验的主张。给定一个IFM和来自多组实验机制的观测数据集,我们推导出训练数据中从未出现的新机制下期望结果的可识别条件。我们采用多种高效算法实现该框架,并将其应用于一系列半合成实验。