Learning graphical conditional independence structures from nonlinear, continuous or mixed data is a central challenge in machine learning and the sciences, and many existing methods struggle to scale to thousands of samples or hundreds of variables. We introduce two basis-expansion tools for scalable causal discovery. First, the Basis Function BIC (BF-BIC) score uses truncated additive expansions to approximate nonlinear dependencies. BF-BIC is theoretically consistent under additive models and extends to post-nonlinear (PNL) models via an invertible reparameterization. It remains robust under moderate interactions and supports mixed data through a degenerate-Gaussian embedding for discrete variables. In simulations with fully nonlinear neural causal models (NCMs), BF-BIC outperforms kernel- and constraint-based methods (e.g., KCI, RFCI) in both accuracy and runtime. Second, the Basis Function Likelihood Ratio Test (BF-LRT) provides an approximate conditional independence test that is substantially faster than kernel tests while retaining competitive accuracy. Extensive simulations and a real-data application to Canadian wildfire risk show that, when integrated into hybrid searches, BF-based methods enable interpretable and scalable causal discovery. Implementations are available in Python, R, and Java.
翻译:从非线性、连续或混合数据中学习图条件独立结构是机器学习和科学领域的核心挑战,现有方法大多难以扩展至数千样本或数百变量。我们引入两种用于可扩展因果发现的基扩展工具。首先,基函数BIC(BF-BIC)得分采用截断加性展开来近似非线性依赖关系。BF-BIC在加性模型下具有理论一致性,并通过可逆重参数化扩展至后非线性(PNL)模型。该方法在中等交互作用下保持稳健,并通过离散变量的退化高斯嵌入支持混合数据。在完全非线性神经因果模型(NCMs)的仿真中,BF-BIC在准确性和运行时间上均优于基于核与约束的方法(如KCI、RFCI)。其次,基函数似然比检验(BF-LRT)提供了一种近似条件独立性检验,其速度显著快于核检验,同时保持具有竞争力的准确性。大量仿真实验及加拿大野火风险的实际数据应用表明,当融入混合搜索框架时,基于BF的方法能够实现可解释且可扩展的因果发现。相关实现已在Python、R和Java平台提供。