Granger causality is among the widely used data-driven approaches for causal analysis of time series data with applications in various areas including economics, molecular biology, and neuroscience. Two of the main challenges of this methodology are: 1) over-fitting as a result of limited data duration, and 2) correlated process noise as a confounding factor, both leading to errors in identifying the causal influences. Sparse estimation via the LASSO has successfully addressed these challenges for parameter estimation. However, the classical statistical tests for Granger causality resort to asymptotic analysis of ordinary least squares, which require long data duration to be useful and are not immune to confounding effects. In this work, we address this disconnect by introducing a LASSO-based statistic and studying its non-asymptotic properties under the assumption that the true models admit sparse autoregressive representations. We establish fundamental limits for reliable identification of Granger causal influences using the proposed LASSO-based statistic. We further characterize the false positive error probability and test power of a simple thresholding rule for identifying Granger causal effects and provide two methods to set the threshold in a data-driven fashion. We present simulation studies and application to real data to compare the performance of our proposed method to ordinary least squares and existing LASSO-based methods in detecting Granger causal influences, which corroborate our theoretical results.
翻译:格兰杰因果关系是一种广泛使用的数据驱动方法,用于时间序列数据的因果分析,其应用涵盖经济学、分子生物学和神经科学等领域。该方法面临的两个主要挑战是:1)数据持续时间有限导致的过拟合,以及2)作为混杂因素的相关过程噪声,这两者都会导致因果影响识别错误。通过LASSO进行稀疏估计已成功解决了参数估计中的这些挑战。然而,经典的格兰杰因果关系统计检验依赖于普通最小二乘法的渐近分析,这需要较长数据持续时间才能发挥作用,且无法避免混杂效应的影响。在本工作中,我们通过引入基于LASSO的统计量并研究其在真实模型具有稀疏自回归表示假设下的非渐近性质,来解决这一脱节问题。我们建立了使用所提出的基于LASSO统计量可靠识别格兰杰因果影响的基本极限。我们进一步刻画了用于识别格兰杰因果效应的简单阈值规则的假阳性错误概率和检验功效,并提供了两种数据驱动设定阈值的方法。我们通过仿真研究及实际数据应用,将所提方法与普通最小二乘法及现有基于LASSO的方法在检测格兰杰因果影响方面的性能进行比较,结果验证了我们的理论分析。