The task of uncovering causal relationships among multivariate time series data stands as an essential and challenging objective that cuts across a broad array of disciplines ranging from climate science to healthcare. Such data entails linear or non-linear relationships, and usually follow multiple a priori unknown regimes. Existing causal discovery methods can infer summary causal graphs from heterogeneous data with known regimes, but they fall short in comprehensively learning both regimes and the corresponding causal graph. In this paper, we introduce CASTOR, a novel framework designed to learn causal relationships in heterogeneous time series data composed of various regimes, each governed by a distinct causal graph. Through the maximization of a score function via the EM algorithm, CASTOR infers the number of regimes and learns linear or non-linear causal relationships in each regime. We demonstrate the robust convergence properties of CASTOR, specifically highlighting its proficiency in accurately identifying unique regimes. Empirical evidence, garnered from exhaustive synthetic experiments and two real-world benchmarks, confirm CASTOR's superior performance in causal discovery compared to baseline methods. By learning a full temporal causal graph for each regime, CASTOR establishes itself as a distinctly interpretable method for causal discovery in heterogeneous time series.
翻译:从多元时间序列数据中挖掘因果关系的任务,是横跨气候科学到医疗保健等多个领域的一项基本且具有挑战性的目标。此类数据蕴含线性或非线性关系,且通常遵循多种先验未知的机制。现有的因果发现方法可以从已知机制的异构数据中推断出概括性因果图,但它们在全面学习机制及相应因果图方面存在不足。本文提出CASTOR,一种新型框架,旨在学习由不同机制组成的异构时间序列数据中的因果关系,其中每种机制由不同的因果图支配。通过EM算法最大化评分函数,CASTOR推断机制数量,并学习每种机制下的线性或非线性因果关系。我们展示了CASTOR稳健的收敛特性,尤其强调了其在准确识别独特机制方面的能力。从大量合成实验及两个真实世界基准中获得的经验证据证实,与基线方法相比,CASTOR在因果发现方面具有卓越性能。通过为每种机制学习完整的时间因果图,CASTOR成为异构时间序列中因果发现的一种具有独特可解释性的方法。