Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data

Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies, i.e., there is no universal classification standard for existing methods, and a lack of comprehensive evaluations, i.e., data characteristics are often ignored to be jointly analyzed when benchmarking algorithms. This study addresses these gaps by conducting an exhaustive review and empirical evaluation for causal discovery methods on numerical data, aiming to provide a clearer and more structured understanding of the field. Our research begins with a comprehensive literature review spanning over two decades, analyzing over 200 academic articles and identifying more than 40 representative algorithms. This extensive analysis leads to the development of a structured taxonomy tailored to the complexities of causal discovery, categorizing methods into six main types. To address the lack of comprehensive evaluations, our study conducts an extensive empirical assessment of 29 causal discovery algorithms on multiple synthetic and real-world datasets. We categorize synthetic datasets based on size, linearity, and noise distribution, employing five evaluation metrics, and summarize the top-3 algorithm recommendations, providing guidelines for users in various data scenarios. Our results highlight a significant impact of dataset characteristics on algorithm performance. Moreover, a metadata extraction strategy with an accuracy exceeding 80% is developed to assist users in algorithm selection on unknown datasets. Based on these insights, we offer professional and practical guidelines to help users choose the most suitable causal discovery methods for their specific dataset.

翻译：因果分析已成为理解各领域现象背后成因的关键组成部分。尽管其重要性日益凸显，现有关于因果发现算法的文献仍存在碎片化问题，方法论缺乏一致性——即现有方法尚无统一的分类标准，且缺乏系统性评估——即在算法基准测试中常忽略数据特征的综合分析。本研究通过对数值数据因果发现方法进行系统性综述与实证评估，致力于填补上述空白，为该领域提供更清晰、更结构化的理解。我们的研究始于对跨越二十余年的文献进行全面回顾，分析了200余篇学术论文，识别出40多种代表性算法。基于此广泛分析，我们针对因果发现的复杂性构建了一个结构化分类体系，将现有方法归纳为六大主要类型。为弥补系统性评估的不足，本研究对29种因果发现算法在多种合成与真实数据集上进行了大规模实证评估。我们根据数据规模、线性特征与噪声分布对合成数据集进行分类，采用五种评估指标，总结出前三名算法推荐清单，为不同数据场景下的用户提供选用指南。研究结果突显了数据集特征对算法性能的显著影响。此外，我们开发了准确率超过80%的元数据提取策略，以协助用户在未知数据集上进行算法选择。基于这些发现，我们提出了专业且实用的指导原则，帮助用户根据特定数据集选择最适宜的因果发现方法。