The behavior of extreme observations is well-understood for time series or spatial data, but little is known if the data generating process is a structural causal model (SCM). We study the behavior of extremes in this model class, both for the observational distribution and under extremal interventions. We show that under suitable regularity conditions on the structure functions, the extremal behavior is described by a multivariate Pareto distribution, which can be represented as a new SCM on an extremal graph. Importantly, the latter is a sub-graph of the graph in the original SCM, which means that causal links can disappear in the tails. We further introduce a directed version of extremal graphical models and show that an extremal SCM satisfies the corresponding Markov properties. Based on a new test of extremal conditional independence, we propose two algorithms for learning the extremal causal structure from data. The first is an extremal version of the PC-algorithm, and the second is a pruning algorithm that removes edges from the original graph to consistently recover the extremal graph. The methods are illustrated on river data with known causal ground truth.
翻译:对于时间序列或空间数据,极端观测值的特性已得到充分理解,但当数据生成过程为结构因果模型(SCM)时,相关研究尚不充分。本研究探讨此类模型中观测分布与极端干预下的极值特性。研究表明,在结构函数满足适当正则性条件下,极值行为可由多元帕累托分布描述,该分布可表示为极值图上的新SCM。值得注意的是,极值图是原始SCM图的子图,这意味着因果关联可能在尾部消失。本文进一步提出有向极值图模型,并证明极值SCM满足相应的马尔可夫性质。基于新提出的极值条件独立性检验方法,我们提出两种从数据中学习极值因果结构的算法:第一种是PC算法的极值版本,第二种是通过从原始图中删减边来一致恢复极值图的剪枝算法。这些方法在具有已知因果真实性的河流数据中得到验证。