Real-world problems, for example in climate applications, often require causal reasoning on spatially gridded time series data or data with comparable structure. While the underlying system is often believed to behave similarly at different Points in space and time, those variations that do exist are relevant twofold: They often encode important information in and of themselves. And they may negatively affect the stability and validity of results if not accounted for. We study the information encoded in changes of the causal graph, with stability in mind. Two core challenges arise, related to the complexity of encoding system-states and to statistical convergence properties in the presence of imperfectly recoverable non-stationary structure. We provide a framework realizing principles conceptually suitable to overcome these challenges - an interpretation supported by numerical experiments. Primarily, we modify constraint-based causal discovery approaches on the level of independence testing. This leads to a framework which is additionally highly modular, easily extensible and widely applicable. For example, it allows to leverage existing constraint-based causal discovery methods (demonstrated on PC, PC-stable, FCI, PCMCI, PCMCI+ and LPCMCI), and to systematically divide the problem into simpler subproblems that are easier to analyze and understand and relate more clearly to well-studied problems like change-point-detection, clustering, independence-testing and more. Code is available at https://github.com/martin-rabel/Causal_GLDF.
翻译:现实世界问题(例如气候应用)通常需要在空间网格化时间序列数据或具有类似结构的数据上进行因果推理。尽管底层系统在不同时空点上的行为通常被认为相似,但确实存在的变异具有双重相关性:它们本身常常编码重要信息;若未加考虑,则可能对结果的稳定性与有效性产生负面影响。我们以稳定性为考量,研究因果图变化中所编码的信息。这引发两个核心挑战:系统状态编码的复杂性,以及存在不完全可恢复的非平稳结构时统计收敛性的问题。我们提出了一个框架,其实现的原则在概念上适于克服这些挑战——数值实验支持了这一解释。我们主要在独立性检验层面改进基于约束的因果发现方法。由此产生的框架还具有高度模块化、易于扩展和广泛适用的特点。例如,它能够利用现有的基于约束的因果发现方法(已在PC、PC-stable、FCI、PCMCI、PCMCI+和LPCMCI上验证),并将问题系统分解为更简单的子问题,这些子问题更易于分析理解,并能更清晰地关联到变点检测、聚类、独立性检验等已有深入研究的问题。代码发布于https://github.com/martin-rabel/Causal_GLDF。