In fields such as finance, climate science, and neuroscience, inferring causal relationships from time series data poses a formidable challenge. While contemporary techniques can handle nonlinear relationships between variables and flexible noise distributions, they rely on the simplifying assumption that data originates from the same underlying causal model. In this work, we relax this assumption and perform causal discovery from time series data originating from mixtures of different causal models. We infer both the underlying structural causal models and the posterior probability for each sample belonging to a specific mixture component. Our approach employs an end-to-end training process that maximizes an evidence-lower bound for data likelihood. Through extensive experimentation on both synthetic and real-world datasets, we demonstrate that our method surpasses state-of-the-art benchmarks in causal discovery tasks, particularly when the data emanates from diverse underlying causal graphs. Theoretically, we prove the identifiability of such a model under some mild assumptions.
翻译:在金融、气候科学和神经科学等领域,从时间序列数据中推断因果关系是一项艰巨的挑战。尽管现有技术能够处理变量间的非线性关系及灵活的噪声分布,但它们依赖于一个简化假设:数据源自同一潜在因果模型。本研究放宽了这一假设,针对源自不同因果模型混合的时间序列数据进行因果发现。我们同时推断潜在的结构因果模型,以及每个样本属于特定混合成分的后验概率。我们的方法采用端到端训练过程,以最大化数据似然的证据下界。通过在合成数据集和真实世界数据集上的广泛实验,我们证明该方法在因果发现任务中超越了现有最优基准,尤其是当数据源于多种不同的潜在因果图时。在理论层面,我们证明了在温和假设下此类模型的可辨识性。