An acyclic causal structure can be described using a directed acyclic graph (DAG) with arrows indicating causation. The task of learning these structures from data is known as ``causal discovery''. Diverse populations or changing environments can sometimes give rise to heterogeneous data. This heterogeneity can be thought of as a mixture model with multiple ``sources'', each exerting their own distinct signature on the observed variables. From this perspective, the source is a latent common cause for every observed variable. While some methods for causal discovery are able to work around unobserved confounding in special cases, the only known ways to deal with a global confounder (such as a latent class) involve parametric assumptions. These assumptions are restrictive, especially for discrete variables. By focusing on discrete observables, we demonstrate that globally confounded causal structures can still be identifiable without parametric assumptions, so long as the number of latent classes remains small relative to the size and sparsity of the underlying DAG.
翻译:有向无环图(DAG)以箭头表示因果关系,可用于描述非循环因果结构。从数据中学习这些结构的任务被称为“因果发现”。多样化的群体或变化的环境有时会产生异质性数据。这种异质性可视为包含多个“源”的混合模型,每个源都对观测变量施加独特的特征印记。从这一视角来看,源是每个观测变量的潜在共同原因。尽管某些因果发现方法能在特殊情况下规避未观测混杂的影响,但已知处理全局混杂因子(如潜在类别)的仅有方法涉及参数化假设。这些假设具有局限性,尤其针对离散变量。通过聚焦离散观测变量,我们证明:只要潜在类别数量相对于底层DAG的规模和稀疏性保持较小,全局混杂因果结构的可识别性仍可在无需参数化假设的条件下实现。