With only observational data on two variables, and without other assumptions, it is not possible to infer which one causes the other. Much of the causal literature has focused on guaranteeing identifiability of causal direction in statistical models for datasets where strong assumptions hold, such as additive noise or restrictions on parameter count. These methods are then subsequently tested on realistic datasets, most of which violate their assumptions. Building on previous attempts, we show how to use causal assumptions within the Bayesian framework. This allows us to specify models with realistic assumptions, while also encoding independent causal mechanisms, leading to an asymmetry between the causal directions. Identifying causal direction then becomes a Bayesian model selection problem. We analyse why Bayesian model selection works for known identifiable cases and flexible model classes, while also providing correctness guarantees about its behaviour. To demonstrate our approach, we construct a Bayesian non-parametric model that can flexibly model the joint. We then outperform previous methods on a wide range of benchmark datasets with varying data generating assumptions showing the usefulness of our method.
翻译:仅凭两个变量的观测数据,且无其他假设条件,无法推断两者间的因果方向。大量因果文献聚焦于在强假设成立的数据集上(如加性噪声或参数数量限制)保证统计模型中因果方向的可识别性。这些方法随后会被应用于实际数据集进行验证,而大多数实际数据集往往违背其假设前提。在前人研究基础上,我们展示了如何在贝叶斯框架内利用因果假设。这使我们既能构建符合现实假设的模型,又能编码独立因果机制,从而在因果方向间产生非对称性。识别因果方向由此转化为贝叶斯模型选择问题。我们分析了为何贝叶斯模型选择在已知可识别情形及灵活模型类别中有效,并提供了其行为正确性的理论保证。为验证该方法,我们构建了一个可灵活拟合联合分布的非参数贝叶斯模型。在涵盖不同数据生成假设的多个基准数据集上,本方法均优于现有方案,充分证明了其实用价值。