Causal discovery aims to recover a causal graph from data generated by it; constraint based methods do so by searching for a d-separating conditioning set of nodes in the graph via an oracle. In this paper, we provide analytic evidence that on large graphs, d-separation is a rare phenomenon, even when guaranteed to exist, unless the graph is extremely sparse. We then provide an analytic average case analysis of the PC Algorithm for causal discovery, as well as a variant of the SGS Algorithm we call UniformSGS. We consider a set $V=\{v_1,\ldots,v_n\}$ of nodes, and generate a random DAG $G=(V,E)$ where $(v_a, v_b) \in E$ with i.i.d. probability $p_1$ if $a<b$ and $0$ if $a > b$. We provide upper bounds on the probability that a subset of $V-\{x,y\}$ d-separates $x$ and $y$, conditional on $x$ and $y$ being d-separable; our upper bounds decay exponentially fast to $0$ as $|V| \rightarrow \infty$. For the PC Algorithm, while it is known that its worst-case guarantees fail on non-sparse graphs, we show that the same is true for the average case, and that the sparsity requirement is quite demanding: for good performance, the density must go to $0$ as $|V| \rightarrow \infty$ even in the average case. For UniformSGS, while it is known that the running time is exponential for existing edges, we show that in the average case, that is the expected running time for most non-existing edges as well.
翻译:因果发现旨在从数据中恢复生成数据的因果图;基于约束的方法通过搜索图中节点的d分离调节集(通过一个神谕)来实现。在本文中,我们提供了分析证据表明,在大图上,d分离是一种罕见现象——即使保证其存在也是如此——除非图极其稀疏。随后,我们对因果发现的PC算法以及我们称为UniformSGS的SGS算法变体进行了平均情况分析。我们考虑一个节点集$V=\{v_1,\ldots,v_n\}$,并生成一个随机有向无环图$G=(V,E)$,其中当$a<b$时$(v_a, v_b) \in E$的概率为独立同分布$p_1$,当$a > b$时概率为$0$。我们给出了在$x$和$y$可d分离的条件下,$V-\{x,y\}$的子集d分离$x$和$y$的概率的上界;这些上界随着$|V| \rightarrow \infty$以指数速度衰减至$0$。对于PC算法,尽管已知其最坏情况保证在非稀疏图上失效,但我们证明平均情况亦然,且稀疏性要求相当严苛:为获得良好性能,即使在平均情况下密度也必须随着$|V| \rightarrow \infty$趋于$0$。对于UniformSGS,尽管已知其对于现有边的运行时间是指数级的,但我们表明在平均情况下,对于大多数不存在的边,预期运行时间同样是指数级的。