Causal discovery is a crucial initial step in establishing causality from empirical data and background knowledge. Numerous algorithms have been developed for this purpose. Among them, the score-matching method has demonstrated superior performance across various evaluation metrics, particularly for the commonly encountered Additive Nonlinear Causal Models. However, current score-matching-based algorithms are primarily designed to analyze independent and identically distributed (i.i.d.) data. More importantly, they suffer from high computational complexity due to the pruning step required for handling dense Directed Acyclic Graphs (DAGs). To enhance the scalability of score matching, we have developed a new parent-finding subroutine for leaf nodes in DAGs, significantly accelerating the most time-consuming part of the process: the pruning step. This improvement results in an efficiency-lifted score matching algorithm, termed Parent Identification-based Causal structure learning for both i.i.d. and temporal data on networKs, or PICK. The new score-matching algorithm extends the scope of existing algorithms and can handle static and temporal data on networks with weak network interference. Our proposed algorithm can efficiently cope with increasingly complex datasets that exhibit spatial and temporal dependencies, commonly encountered in academia and industry. The proposed algorithm can accelerate score-matching-based methods while maintaining high accuracy in real-world applications.
翻译:因果发现是从经验数据和背景知识中确立因果关系的关键初始步骤。为此,目前已发展出众多算法。其中,评分匹配方法在各类评估指标上均展现出卓越性能,尤其适用于常见的加性非线性因果模型。然而,当前的评分匹配算法主要针对独立同分布数据设计,更重要的是,由于处理稠密有向无环图时必须进行剪枝步骤,导致其计算复杂度极高。为增强评分匹配的可扩展性,我们针对有向无环图中的叶节点开发了一种新的父节点查找子程序,显著加速了流程中最耗时的部分——剪枝步骤。这一改进催生了一种效率提升的评分匹配算法,称为基于父节点识别的网络静态与时序数据因果结构学习算法(PICK)。该新算法拓展了现有算法的应用范围,可处理存在弱网络干扰的静态与网络时序数据。我们提出的算法能够高效应对学术与工业界日益常见的、呈现空间与时间依赖性的复杂数据集。该算法可在保持高精度的同时,加速基于评分匹配的各类方法在实际应用中的运行效率。