We propose a constraint-based algorithm, which automatically determines causal relevance thresholds, to infer causal networks from data. We call these topological thresholds. We present two methods for determining the threshold: the first seeks a set of edges that leaves no disconnected nodes in the network; the second seeks a causal large connected component in the data. We tested these methods both for discrete synthetic and real data, and compared the results with those obtained for the PC algorithm, which we took as the benchmark. We show that this novel algorithm is generally faster and more accurate than the PC algorithm. The algorithm for determining the thresholds requires choosing a measure of causality. We tested our methods for Fisher Correlations, commonly used in PC algorithm (for instance in \cite{kalisch2005}), and further proposed a discrete and asymmetric measure of causality, that we called Net Influence, which provided very good results when inferring causal networks from discrete data. This metric allows for inferring directionality of the edges in the process of applying the thresholds, speeding up the inference of causal DAGs.
翻译:我们提出了一种基于约束的算法,该算法能够自动确定因果相关性阈值,从而从数据中推断因果网络。我们将这些阈值称为拓扑阈值。我们提出了两种确定阈值的方法:第一种方法旨在寻找一组边,使得网络中不存在孤立节点;第二种方法则在数据中寻找一个大型因果连通分量。我们针对离散合成数据和真实数据测试了这两种方法,并将结果与作为基准的PC算法进行了比较。实验表明,这种新算法通常比PC算法更快、更准确。确定阈值的算法需要选择一种因果度量。我们测试了PC算法中常用的费舍尔相关性(例如,在文献\cite{kalisch2005}中),并进一步提出了一种离散且非对称的因果度量——净影响,该度量在从离散数据推断因果网络时表现出极佳的效果。该度量允许在应用阈值的过程中推断边的方向性,从而加速因果有向无环图的推断。