Causal discovery and inference from observational data is an essential problem in statistics posing both modeling and computational challenges. These are typically addressed by imposing strict assumptions on the joint distribution such as linearity. We consider the problem of the Bayesian estimation of the effects of hypothetical interventions in the Gaussian Process Network (GPN) model, a flexible causal framework which allows describing the causal relationships nonparametrically. We detail how to perform causal inference on GPNs by simulating the effect of an intervention across the whole network and propagating the effect of the intervention on downstream variables. We further derive a simpler computational approximation by estimating the intervention distribution as a function of local variables only, modeling the conditional distributions via additive Gaussian processes. We extend both frameworks beyond the case of a known causal graph, incorporating uncertainty about the causal structure via Markov chain Monte Carlo methods. Simulation studies show that our approach is able to identify the effects of hypothetical interventions with non-Gaussian, non-linear observational data and accurately reflect the posterior uncertainty of the causal estimates. Finally we compare the results of our GPN-based causal inference approach to existing methods on a dataset of $A.~thaliana$ gene expressions.
翻译:论文摘要:从观测数据中发现因果关系并进行推断是统计学中的核心问题,同时面临建模与计算两方面的挑战。现有方法通常对联合分布施加严格假设(如线性关系)以应对这些挑战。本文研究在高斯过程网络(GPN)模型中通过贝叶斯方法估计假设干预效应的技术路径。GPN是一种灵活的因果框架,能够以非参数方式描述因果关系。我们详细阐述了如何通过模拟干预在整个网络中的传播效应,以及干预对下游变量的影响机制,来实现对GPN的因果推断。进一步地,我们提出一种简化的计算近似方法:仅利用局部变量估计干预分布,并通过加法高斯过程对条件分布建模。我们将两种框架扩展至因果图未知的场景,通过马尔可夫链蒙特卡洛方法融入因果结构的不确定性。仿真研究表明,该方法能够识别非高斯、非线性观测数据中的假设干预效应,并准确反映因果估计的后验不确定性。最后,我们基于拟南芥基因表达数据集,将基于GPN的因果推断方法结果与现有方法进行了对比分析。