Causal discovery methods have demonstrated the ability to identify the time series graphs representing the causal temporal dependency structure of dynamical systems. However, they do not include a measure of the confidence of the estimated links. Here, we introduce a novel bootstrap aggregation (bagging) and confidence measure method that is combined with time series causal discovery. This new method allows measuring confidence for the links of the time series graphs calculated by causal discovery methods. This is done by bootstrapping the original times series data set while preserving temporal dependencies. Next to confidence measures, aggregating the bootstrapped graphs by majority voting yields a final aggregated output graph. In this work, we combine our approach with the state-of-the-art conditional-independence-based algorithm PCMCI+. With extensive numerical experiments we empirically demonstrate that, in addition to providing confidence measures for links, Bagged-PCMCI+ improves the precision and recall of its base algorithm PCMCI+. Specifically, Bagged-PCMCI+ has a higher detection power regarding adjacencies and a higher precision in orienting contemporaneous edges while at the same time showing a lower rate of false positives. These performance improvements are especially pronounced in the more challenging settings (short time sample size, large number of variables, high autocorrelation). Our bootstrap approach can also be combined with other time series causal discovery algorithms and can be of considerable use in many real-world applications, especially when confidence measures for the links are desired.
翻译:因果发现方法已展现出识别表征动态系统因果时序依赖结构的时间序列图的能力。然而,这些方法并未包含对估计链接的置信度度量。本文提出一种结合时间序列因果发现的新型Bootstrap聚合(Bagging)与置信度度量方法。该方法允许测量由因果发现方法计算的时间序列图中各链接的置信度,其实现方式是在保留时序依赖关系的条件下对原始时间序列数据集进行Bootstrap重采样。除置信度度量外,通过多数投票聚合Bootstrap图可生成最终聚合输出图。本研究将所提方法与基于条件独立性的前沿算法PCMCI+相结合。通过大量数值实验,我们实证表明:Bagged-PCMCI+在提供链接置信度度量的同时,能提升基础算法PCMCI+的精确率与召回率。具体而言,Bagged-PCMCI+在邻接关系检测方面具有更高检测效力,在同期边定向方面具有更高精度,同时表现出更低的假阳性率。这些性能提升在更具挑战性的设定下(短时间样本量、大量变量、高自相关性)尤为显著。所提Bootstrap方法还可与其他时间序列因果发现算法结合,在许多实际应用场景(特别是需要链接置信度度量时)具有重要实用价值。