This paper introduces a new framework for recovering causal graphs from observational data, leveraging the observation that the distribution of an effect, conditioned on its causes, remains invariant to changes in the prior distribution of those causes. This insight enables a direct test for potential causal relationships by checking the variance of their corresponding effect-cause conditional distributions across multiple downsampled subsets of the data. These subsets are selected to reflect different prior cause distributions, while preserving the effect-cause conditional relationships. Using this invariance test and exploiting an (empirical) sparsity of most causal graphs, we develop an algorithm that efficiently uncovers causal relationships with quadratic complexity in the number of observational variables, reducing the processing time by up to 25x compared to state-of-the-art methods. Our empirical experiments on a varied benchmark of large-scale datasets show superior or equivalent performance compared to existing works, while achieving enhanced scalability.
翻译:本文提出了一种从观测数据中恢复因果图的新框架,其核心思想在于:在给定其因的条件下,果的分布对于这些因的先验分布的变化保持不变。这一洞见使得我们能够通过检查数据多个下采样子集中对应的果-因条件分布的方差,来直接检验潜在的因果关系。这些子集被选取以反映不同的先验因分布,同时保持果-因条件关系不变。利用这一不变性检验,并结合大多数因果图具有的(经验性)稀疏性,我们开发了一种算法,该算法能够以观测变量数量的二次复杂度高效地揭示因果关系,与现有最先进方法相比,处理时间最多可减少25倍。我们在多样化的大规模数据集基准上的实证实验表明,与现有工作相比,本方法具有相当或更优的性能,同时实现了更强的可扩展性。