Unbiased data synthesis is crucial for evaluating causal discovery algorithms in the presence of unobserved confounding, given the scarcity of real-world datasets. A common approach, implicit parameterization, encodes unobserved confounding by modifying the off-diagonal entries of the idiosyncratic covariance matrix while preserving positive definiteness. Within this approach, we identify that state-of-the-art protocols have two distinct issues that hinder unbiased sampling from the complete space of causal models: first, we give a detailed analysis of use of diagonally dominant constructions restricts the spectrum of partial correlation matrices; and second, the restriction of possible graphical structures when sampling bidirected edges, unnecessarily ruling out valid causal models. To address these limitations, we propose an improved explicit modeling approach for unobserved confounding, leveraging block-hierarchical ancestral generation of ground truth causal graphs. Algorithms for converting the ground truth DAG into ancestral graph is provided so that the output of causal discovery algorithms could be compared with. We draw connections between implicit and explicit parameterization, prove that our approach fully covers the space of causal models, including those generated by the implicit parameterization, thus enabling more robust evaluation of methods for causal discovery and inference.
翻译:在未观测混杂存在的情况下,由于真实世界数据集的稀缺性,无偏数据合成对于评估因果发现算法至关重要。一种常见方法——隐式参数化——通过修改特质协方差矩阵的非对角元素来编码未观测混杂,同时保持正定性。在此方法框架内,我们发现现有先进协议存在两个阻碍从完整因果模型空间进行无偏采样的突出问题:首先,我们详细分析了使用对角占优构造会限制偏相关矩阵的谱范围;其次,在采样双向边时对可能图结构的限制,不必要地排除了有效的因果模型。为应对这些局限,我们提出一种改进的显式建模方法处理未观测混杂,利用真实因果图的块层次祖先生成机制。本文提供了将真实有向无环图转换为祖先图的算法,以便将因果发现算法的输出与之比较。我们建立了隐式与显式参数化之间的联系,证明所提方法完全覆盖因果模型空间(包括隐式参数化生成的模型),从而能够对因果发现与推断方法进行更稳健的评估。