Unbiased data synthesis is crucial for evaluating causal discovery algorithms in the presence of unobserved confounding, given the scarcity of real-world datasets. A common approach, implicit parameterization, encodes unobserved confounding by modifying the off-diagonal entries of the idiosyncratic covariance matrix while preserving positive definiteness. Within this approach, state-of-the-art protocols have two distinct issues that hinder unbiased sampling from the complete space of causal models: first, the use of diagonally dominant constructions, which restrict the spectrum of partial correlation matrices; and second, the restriction of possible graphical structures when sampling bidirected edges, unnecessarily ruling out valid causal models. To address these limitations, we propose an improved explicit modeling approach for unobserved confounding, leveraging block-hierarchical ancestral generation of ground truth causal graphs. Algorithms for converting the ground truth DAG into ancestral graph is provided so that the output of causal discovery algorithms could be compared with. We prove that our approach fully covers the space of causal models, including those generated by the implicit parameterization, thus enabling more robust evaluation of methods for causal discovery and inference.
翻译:在未观测混杂存在的情况下,无偏数据合成对于评估因果发现算法至关重要,因为真实世界数据集十分稀缺。一种常见方法——隐式参数化——通过修改特异性协方差矩阵的非对角元素来编码未观测混杂,同时保持矩阵的正定性。在此方法框架内,现有最先进的协议存在两个阻碍从因果模型完整空间进行无偏采样的突出问题:首先,使用对角占优构造,这限制了偏相关矩阵的谱范围;其次,在采样双向边时对可能图结构的限制,不必要地排除了有效的因果模型。为应对这些局限,我们提出一种改进的显式建模方法用于处理未观测混杂,该方法利用了真实因果图的块层次祖先生成。我们提供了将真实有向无环图转换为祖先图的算法,以便将因果发现算法的输出与之比较。我们证明,我们的方法完全覆盖了因果模型的空间,包括由隐式参数化生成的模型,从而能够对因果发现与推断方法进行更稳健的评估。