Causal discovery on vector-valued variables and consistency-guided aggregation

Causal discovery (CD) aims to discover the causal graph underlying the data generation mechanism of observed variables. In many real-world applications, the observed variables are vector-valued, such as in climate science where variables are defined over a spatial grid and the task is called spatio-temporal causal discovery. We motivate CD in vector-valued variable setting while considering different possibilities for the underlying model, and highlight the pitfalls of commonly-used approaches when compared to a fully vectorized approach. Furthermore, often the vector-valued variables are high-dimensional, and aggregations of the variables, such as averages, are considered in interest of efficiency and robustness. In the absence of interventional data, testing for the soundness of aggregate variables as consistent abstractions that map a low-level to a high-level structural causal model (SCM) is hard, and recent works have illustrated the stringency of conditions required for testing consistency. In this work, we take a careful look at the task of vector-valued CD via constraint-based methods, focusing on the problem of consistency of aggregation for this task. We derive three aggregation consistency scores, based on compatibility of independence models and (partial) aggregation, that quantify different aspects of the soundness of an aggregation map for the CD problem. We present the argument that the consistency of causal abstractions must be separated from the task-dependent consistency of aggregation maps. As an actionable conclusion of our findings, we propose a wrapper Adag to optimize a chosen aggregation consistency score for aggregate-CD, to make the output of CD over aggregate variables more reliable. We supplement all our findings with experimental evaluations on synthetic non-time series and spatio-temporal data.

翻译：因果发现旨在揭示观测变量数据生成机制背后的因果图。在许多实际应用中，观测变量是向量值的，例如在气候科学中，变量定义在空间网格上，此类任务被称为时空因果发现。本文探讨向量值变量设定下的因果发现问题，同时考虑底层模型的不同可能性，并通过与完全向量化方法的比较，揭示常用方法存在的缺陷。此外，向量值变量往往具有高维特性，为提升效率与鲁棒性，常需考虑对变量进行聚合（如取平均值）。在缺乏干预数据的情况下，检验聚合变量作为将低层结构因果模型映射到高层结构因果模型的一致性抽象是否可靠十分困难，近期研究已证明检验一致性所需条件的严格性。本研究通过基于约束的方法，深入探讨向量值因果发现任务，重点关注该任务中聚合操作的一致性问题。基于独立性模型与（部分）聚合的相容性，我们推导出三种聚合一致性评分，用以量化因果发现问题中聚合映射可靠性的不同方面。我们提出论证：因果抽象的一致性必须与聚合映射在任务依赖层面的一致性区分开来。基于研究结论，我们提出可操作的封装方法Adag，通过优化选定的一致性评分来实现聚合因果发现，从而提高基于聚合变量的因果发现输出结果的可靠性。我们通过合成非时间序列数据与时空数据的实验评估，为所有研究发现提供佐证。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日