Vector autoregression has been widely used for modeling and analysis of multivariate time series data. In high-dimensional settings, model parameter regularization schemes inducing sparsity yield interpretable models and achieved good forecasting performance. However, in many data applications, such as those in neuroscience, the Granger causality graph estimates from existing vector autoregression methods tend to be quite dense and difficult to interpret, unless one compromises on the goodness-of-fit. To address this issue, this paper proposes to incorporate a commonly used structural assumption -- that the ground-truth graph should be largely connected, in the sense that it should only contain at most a few components. We take a Bayesian approach and develop a novel tree-rank prior distribution for the regression coefficients. Specifically, this prior distribution forces the non-zero coefficients to appear only on the union of a few spanning trees. Since each spanning tree connects $p$ nodes with only $(p-1)$ edges, it effectively achieves both high connectivity and high sparsity. We develop a computationally efficient Gibbs sampler that is scalable to large sample size and high dimension. In analyzing test-retest functional magnetic resonance imaging data, our model produces a much more interpretable graph estimate, compared to popular existing approaches. In addition, we show appealing properties of this new method, such as efficient computation, mild stability conditions and posterior consistency.
翻译:向量自回归模型已广泛应用于多元时间序列数据的建模与分析。在高维场景中,通过引入稀疏性约束的正则化方案可获得可解释模型并实现良好的预测性能。然而在神经科学等实际应用中,现有向量自回归方法估计的格兰杰因果关系图往往过于稠密且难以解释,除非牺牲模型拟合优度。为解决该问题,本文提出采用一种常见的结构性假设——真实因果关系图应具有高度连通性,即仅包含少数几个连通分量。我们采用贝叶斯方法,为回归系数设计了一种新颖的树秩先验分布。具体而言,该先验分布迫使非零系数仅出现在若干生成树的并集上。由于每棵生成树仅用(p-1)条边连接p个节点,该方法能同时实现高连通性与高稀疏性。我们开发了计算高效的吉布斯采样器,可扩展至大样本量和高维场景。在测试-重测功能磁共振成像数据分析中,与现有主流方法相比,我们的模型生成了更具可解释性的图估计结果。此外,我们展示了该方法在高效计算、温和稳定性条件及后验一致性等方面的优良特性。