Causal Discovery (CD) is the process of identifying the cause-effect relationships among the variables of a system from data. Over the years, several methods have been developed primarily based on the statistical properties of data to uncover the underlying causal mechanism. In this study, we present an extensive discussion on the methods designed to perform causal discovery from both independent and identically distributed (i.i.d.) data and time series data. For this purpose, we first introduce the common terminologies in causal discovery, and then provide a comprehensive discussion of the algorithms designed to identify the causal edges in different settings. We further discuss some of the benchmark datasets available for evaluating the performance of the causal discovery methods, available tools or software packages to perform causal discovery readily, and the common metrics used to evaluate these methods. We also test some common causal discovery algorithms on different benchmark datasets, and compare their performances. Finally, we conclude by presenting the common challenges involved in causal discovery, and also, discuss the applications of causal discovery in multiple areas of interest.
翻译:因果发现是从数据中识别系统变量间因果关系的过程。多年来,研究者主要基于数据的统计特性,开发了多种方法以揭示潜在因果机制。本文对面向独立同分布数据与时间序列数据的因果发现方法进行了深入探讨。为此,我们首先介绍因果发现中的常用术语,然后全面讨论在不同场景下用于识别因果边的算法设计。此外,我们系统梳理了用于评估因果发现方法性能的基准数据集、可直接使用的工具或软件包,以及评价此类方法的常用指标。通过对不同基准数据集上若干典型因果发现算法的测试与性能对比,我们最后总结了因果发现中面临的共性挑战,并探讨了因果发现在多个兴趣领域的应用。