Directed acyclic graphs are used to model the causal structure of a system. ``Causal discovery'' describes the problem of learning this structure from data. When data is an aggregate from multiple sources (populations or environments), global confounding obscures conditional independence properties that drive many causal discovery algorithms. This setting is sometimes known as a mixture model or a latent class. While some modern methods for causal discovery are able to work around unobserved confounding in specific cases, the only known ways to deal with a global confounder involve parametric assumptions. that are unsuitable for discrete distributions.Focusing on discrete and non-parametric observed variables, we demonstrate that causal discovery can still be identifiable under bounded latent classes. The feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure.
翻译:有向无环图用于建模系统的因果结构。“因果发现”描述从数据中学习该结构的问题。当数据来自多个来源(群体或环境)的聚合时,全局混杂会掩盖许多因果发现算法所依赖的条件独立性性质。这种设置有时被称为混合模型或隐类。尽管一些现代因果发现方法能在特定情况下绕过未观测到的混杂,但已知处理全局混杂的唯一方法涉及参数化假设,而这些假设不适用于离散分布。聚焦于离散且非参数的观测变量,我们证明在有限隐类下因果发现仍可识别。该问题的可行性受制于全局混杂的基数、观测变量的基数以及因果结构稀疏性之间的权衡。