Causal structure learning, also known as causal discovery, aims to estimate causal relationships between variables as a form of a causal directed acyclic graph (DAG) from observational data. One of the major frameworks is the order-based approach that first estimates a topological order of the underlying DAG and then prunes spurious edges from the fully-connected DAG induced by the estimated topological order. Previous studies often focus on the former ordering step because it can dramatically reduce the search space of DAGs. In practice, the latter pruning step is equally crucial for ensuring both computational efficiency and estimation accuracy. Most existing methods employ a pruning technique based on generalized additive models and hypothesis testing, commonly known as CAM-pruning. However, this approach can be a computational bottleneck as it requires repeatedly fitting additive models for all variables. Furthermore, it may harm estimation quality due to multiple testing. To address these issues, we introduce a new pruning method based on sparse additive models, which enables direct pruning of redundant edges without relying on hypothesis testing. We propose an efficient algorithm for learning sparse additive models by combining the randomized tree embedding technique with group-wise sparse regression. Experimental results on both synthetic and real datasets demonstrated that our method is significantly faster than existing pruning methods while maintaining comparable or superior accuracy.
翻译:因果结构学习,亦称因果发现,旨在从观测数据中估计变量间的因果关系,并以因果有向无环图的形式呈现。主流框架之一是基于序的方法,该方法首先估计底层DAG的拓扑序,然后从由估计拓扑序导出的全连接DAG中剪除虚假边。以往研究多聚焦于前期的排序步骤,因其能显著压缩DAG的搜索空间。在实际应用中,后续的剪枝步骤对于保障计算效率与估计精度同等关键。现有方法多采用基于广义加性模型与假设检验的剪枝技术(常称为CAM剪枝),但该方法需为所有变量重复拟合加性模型,可能成为计算瓶颈,且多重检验可能损害估计质量。为解决这些问题,我们提出一种基于稀疏加性模型的新型剪枝方法,无需依赖假设检验即可直接剪除冗余边。通过将随机化树嵌入技术与分组稀疏回归相结合,我们设计了一种高效学习稀疏加性模型的算法。在合成与真实数据集上的实验结果表明,本方法在保持相当或更优精度的同时,计算速度显著超越现有剪枝方法。