Learning causal graphs from multivariate time series is a ubiquitous challenge in all application domains dealing with time-dependent systems, such as in Earth sciences, biology, or engineering, to name a few. Recent developments for this causal discovery learning task have shown considerable skill, notably the specific time-series adaptations of the popular conditional independence-based learning framework. However, uncertainty estimation is challenging for conditional independence-based methods. Here, we introduce a novel bootstrap approach designed for time series causal discovery that preserves the temporal dependencies and lag structure. It can be combined with a range of time series causal discovery methods and provides a measure of confidence for the links of the time series graphs. Furthermore, next to confidence estimation, an aggregation, also called bagging, of the bootstrapped graphs by majority voting results in bagged causal discovery methods. In this work, we combine this approach with the state-of-the-art conditional-independence-based algorithm PCMCI+. With extensive numerical experiments we empirically demonstrate that, in addition to providing confidence measures for links, Bagged-PCMCI+ improves in precision and recall as compared to its base algorithm PCMCI+, at the cost of higher computational demands. These statistical performance improvements are especially pronounced in the more challenging settings (short time sample size, large number of variables, high autocorrelation). Our bootstrap approach can also be combined with other time series causal discovery algorithms and can be of considerable use in many real-world applications.
翻译:从多变量时间序列中学习因果图是涉及时间依赖系统的应用领域(如地球科学、生物学或工程学等)中的普遍难题。针对这一因果发现学习任务,近期发展显示出显著能力,特别是基于条件独立性学习框架的特定时间序列改编方法。然而,对于基于条件独立性的方法而言,不确定性估计颇具挑战性。本文提出一种专为时间序列因果发现设计的全新Bootstrap方法,该方法能保留时间依赖性和滞后结构。它可与多种时间序列因果发现方法结合使用,并为时间序列图中的链接提供置信度度量。此外,除置信度估计外,通过多数投票对Bootstrap图进行聚合(亦称装袋),可形成袋装因果发现方法。本工作将此方法与基于条件独立性的前沿算法PCMCI+相结合。通过大量数值实验,我们实证表明:与基础算法PCMCI+相比,Bagged-PCMCI+在提供链接置信度度量的同时,虽计算需求更高,但显著提升了精确率与召回率。这些统计性能改进在更具挑战性的场景(短时间样本量、大量变量、高自相关性)中尤为突出。我们的Bootstrap方法还可与其他时间序列因果发现算法结合,在众多实际应用中具有重要价值。