In Causal Discovery with latent variables, We define two data paradigms: definite data: a single-skeleton structure with observed nodes single-value, and indefinite data: a set of multi-skeleton structures with observed nodes multi-value. Multi,skeletons induce low sample utilization and multi values induce incapability of the distribution assumption, both leading that recovering causal relations from indefinite data is, as of yet, largely unexplored. We design the causal strength variational model to settle down these two problems. Specifically, we leverage the causal strength instead of independent noise as latent variable to mediate evidence lower bound. By this design ethos, The causal strength of different skeletons is regarded as a distribution and can be expressed as a single-valued causal graph matrix. Moreover, considering the latent confounders, we disentangle the causal graph G into two relatisubgraphs O and C. O contains pure relations between observed nodes, while C represents the relations from latent variables to observed nodes. We summarize the above designs as Confounding Disentanglement Causal Discovery (biCD), which is tailored to learn causal representation from indefinite data under the latent confounding. Finally, we conduct comprehensive experiments on synthetic and real-world data to demonstrate the effectiveness of our method.
翻译:在含隐变量的因果发现中,我们定义两种数据范式:定数据(definite data)——单骨架结构且观测节点取单值;非定数据(indefinite data)——多骨架结构集合且观测节点取多值。多骨架结构导致样本利用率低下,多值特性导致分布假设失效,二者均使得从非定数据中恢复因果关系的研究至今尚待深入探索。为攻克这两大问题,我们设计了因果强度变分模型。具体而言,我们以因果强度替代独立噪声作为隐变量,用于调控证据下界。基于此设计理念,不同骨架的因果强度被视为一种分布,并可表示为单值的因果图矩阵。此外,考虑隐混淆因子,我们将因果图G解耦为两个相关子图O和C:O包含观测节点间的纯关系,C表示隐变量到观测节点的关系。我们将上述设计统称为“混淆解耦因果发现”(biCD),该方法专用于在隐混淆条件下从非定数据中学习因果表征。最后,我们在合成数据与真实数据上开展全面实验,验证了方法的有效性。