Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a "one-size-fits-all" paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.
翻译:格兰杰因果发现是分析复杂系统时序依赖关系的核心任务。然而,现有神经格兰杰因果发现方法主要遵循"一刀切"范式,难以捕捉真实世界时间序列中固有的分布偏移与动态机制切换,常导致表征纠缠与虚假因果图。本文提出CausalMoE——一个亿级多模态格兰杰因果基础模型,显式建模序列块级异质性。该模型引入模式路由异构专家混合机制,动态识别潜在时序模式并将序列块路由至特定领域专家,有效解耦机制特异性模式与共享动力学。为保证可解释的图重构,我们设计了跨变量的因果感知自注意力机制,通过近端优化生成稀疏格兰杰因果图。此外,CausalMoE首次融合大语言模型与视觉语言模型,将数值信号与文本、视觉先验对齐,在复杂场景下正则化因果估计。大量实验表明,CausalMoE在全监督基准上达到新最优性能,同时在传统方法失效的小样本场景中展现出高效泛化能力。