Causal inference from observational data has recently found many applications in machine learning. While sound and complete algorithms exist to compute causal effects, many of these algorithms require explicit access to conditional likelihoods over the observational distribution, which is difficult to estimate in the high-dimensional regime, such as with images. To alleviate this issue, researchers have approached the problem by simulating causal relations with neural models and obtained impressive results. However, none of these existing approaches can be applied to generic scenarios such as causal graphs on image data with latent confounders, or obtain conditional interventional samples. In this paper, we show that any identifiable causal effect given an arbitrary causal graph can be computed through push-forward computations of conditional generative models. Based on this result, we devise a diffusion-based approach to sample from any (conditional) interventional distribution on image data. To showcase our algorithm's performance, we conduct experiments on a Colored MNIST dataset having both the treatment ($X$) and the target variables ($Y$) as images and obtain interventional samples from $P(y|do(x))$. As an application of our algorithm, we evaluate two large conditional generative models that are pre-trained on the CelebA dataset by analyzing the strength of spurious correlations and the level of disentanglement they achieve.
翻译:从观测数据中进行因果推断近期在机器学习领域获得了诸多应用。尽管存在完备且正确的算法来估计因果效应,但许多此类算法需要显式获取观测分布上的条件似然,这在如图像等高维场景下难以估计。为解决此问题,研究者通过神经模型模拟因果关系取得了显著成果,但现有方法均无法适用于泛化场景(例如存在隐混变量的图像数据因果图)或获得条件干预样本。本文证明,给定任意因果图的可识别因果效应均可通过条件生成模型的推前计算实现。基于该结论,我们提出了一种基于扩散模型的算法,能从图像数据上的任意(条件)干预分布中采样。为展示算法性能,我们在Colored MNIST数据集(其中处理变量X和目标变量Y均为图像)上开展实验,从P(y|do(x))中获取干预样本。作为算法应用实例,我们通过分析两个预训练于CelebA数据集的大型条件生成模型所具备的虚假相关强度与解缠程度,对其性能进行了评估。