Constraint-based causal discovery relies on numerous conditional independence tests (CITs), but its practical applicability is severely constrained by the prohibitive computational cost, especially as CITs themselves have high time complexity with respect to the sample size. To address this key bottleneck, we introduce the Ensemble Conditional Independence Test (E-CIT), a general-purpose and plug-and-play framework. E-CIT operates on an intuitive divide-and-aggregate strategy: it partitions the data into subsets, applies a given base CIT independently to each subset, and aggregates the resulting p-values using a novel method grounded in the properties of stable distributions. This framework reduces the computational complexity of a base CIT to linear in the sample size when the subset size is fixed. Moreover, our tailored p-value combination method offers theoretical consistency guarantees under mild conditions on the subtests. Experimental results demonstrate that E-CIT not only significantly reduces the computational burden of CITs and causal discovery but also achieves competitive performance. Notably, it exhibits an improvement in complex testing scenarios, particularly on real-world datasets.
翻译:基于约束的因果发现方法依赖于大量条件独立性检验(CIT),但其实际应用受到高昂计算成本的严重制约,尤其是CIT本身的时间复杂度随样本量增长而急剧增加。为应对这一关键瓶颈,我们提出了集成条件独立性检验(E-CIT),一个通用且即插即用的框架。E-CIT采用直观的“分割-聚合”策略:将数据划分为若干子集,对每个子集独立应用给定的基础CIT,并基于稳定分布的特性,采用一种新颖的方法对所得p值进行聚合。当子集规模固定时,该框架将基础CIT的计算复杂度降低至样本量的线性级别。此外,我们专门设计的p值组合方法在子检验满足温和条件下具有理论一致性保证。实验结果表明,E-CIT不仅显著降低了CIT及因果发现的计算负担,而且取得了具有竞争力的性能。值得注意的是,在复杂的检验场景中,尤其是在真实世界数据集上,该方法表现出性能提升。