Many important phenomena in scientific fields such as climate, neuroscience, and epidemiology are naturally represented as spatiotemporal gridded data with complex interactions. For example, in climate science, researchers aim to uncover how large-scale events, such as the North Atlantic Oscillation (NAO) and the Antarctic Oscillation (AAO), influence other global processes. Inferring causal relationships from these data is a challenging problem compounded by the high dimensionality of such data and the correlations between spatially proximate points. We present SPACY (SPAtiotemporal Causal discoverY), a novel framework based on variational inference, designed to explicitly model latent time-series and their causal relationships from spatially confined modes in the data. Our method uses an end-to-end training process that maximizes an evidence-lower bound (ELBO) for the data likelihood. Theoretically, we show that, under some conditions, the latent variables are identifiable up to transformation by an invertible matrix. Empirically, we show that SPACY outperforms state-of-the-art baselines on synthetic data, remains scalable for large grids, and identifies key known phenomena from real-world climate data.
翻译:在气候学、神经科学和流行病学等科学领域中,许多重要现象自然地表现为具有复杂相互作用的时空网格化数据。例如,在气候科学中,研究人员旨在揭示诸如北大西洋涛动(NAO)和南极涛动(AAO)等大尺度事件如何影响其他全球过程。从这些数据中推断因果关系是一个具有挑战性的问题,此类数据的高维性以及空间邻近点之间的相关性加剧了这一挑战。我们提出了SPACY(时空因果发现),这是一个基于变分推断的新型框架,旨在从数据中的空间受限模态中显式建模潜在时间序列及其因果关系。我们的方法使用端到端的训练过程,最大化数据似然的证据下界(ELBO)。理论上,我们证明在某些条件下,潜在变量在可逆矩阵变换下是可识别的。实证结果表明,SPACY在合成数据上优于最先进的基线方法,对于大型网格保持可扩展性,并能从真实世界气候数据中识别出已知的关键现象。