Identifying causality is a challenging task in many data-intensive scenarios. Many algorithms have been proposed for this critical task. However, most of them consider the learning algorithms for directed acyclic graph (DAG) of Bayesian network (BN). These BN-based models only have limited causal explainability because of the issue of Markov equivalence class. Moreover, they are dependent on the assumption of stationarity, whereas many sampling time series from complex system are nonstationary. The nonstationary time series bring dataset shift problem, which leads to the unsatisfactory performances of these algorithms. To fill these gaps, a novel causation model named Unique Causal Network (UCN) is proposed in this paper. Different from the previous BN-based models, UCN considers the influence of time delay, and proves the uniqueness of obtained network structure, which addresses the issue of Markov equivalence class. Furthermore, based on the decomposability property of UCN, a higher-order causal entropy (HCE) algorithm is designed to identify the structure of UCN in a distributed way. HCE algorithm measures the strength of causality by using nearest-neighbors entropy estimator, which works well on nonstationary time series. Finally, lots of experiments validate that HCE algorithm achieves state-of-the-art accuracy when time series are nonstationary, compared to the other baseline algorithms.
翻译:因果识别是许多数据密集型场景中的一项具有挑战性的任务。针对这一关键任务,研究者已提出多种算法。然而,大多数算法考虑的是贝叶斯网络(BN)中有向无环图(DAG)的学习方法。这些基于BN的模型因马尔可夫等价类问题而仅具有有限的因果可解释性。此外,它们依赖于平稳性假设,而来自复杂系统的许多采样时间序列却是非平稳的。非平稳时间序列带来了数据集偏移问题,导致这些算法性能不佳。为弥补这些不足,本文提出了一种名为唯一因果网络(UCN)的新型因果模型。与以往基于BN的模型不同,UCN考虑了时间延迟的影响,并证明了所获网络结构的唯一性,从而解决了马尔可夫等价类问题。进一步地,基于UCN的可分解特性,设计了一种高阶因果熵(HCE)算法,以分布式方式识别UCN的结构。HCE算法利用最近邻熵估计器衡量因果强度,该估计器在非平稳时间序列上表现良好。最后,大量实验验证了,在时间序列非平稳的情况下,与其他基线算法相比,HCE算法达到了最先进的精度。