Directed acyclic graphs are used to model the causal structure of a system. ``Causal discovery'' describes the problem of learning this structure from data. When data is an aggregate from multiple sources (populations or environments), global confounding obscures conditional independence properties that drive many causal discovery algorithms. For this reason, existing causal discovery algorithms are not suitable for the multiple-source setting. We demonstrate that, if the confounding is of bounded cardinality (i.e. the data comes from a limited number of sources), causal discovery can still be achieved. The feasibility of this problem is governed by a trade-off between the cardinality of the global confounder, the cardinalities of the observed variables, and the sparsity of the causal structure.
翻译:有向无环图用于建模系统的因果结构。“因果发现”描述的是从数据中学习这种结构的问题。当数据聚合自多个来源(群体或环境)时,全局混杂会掩盖驱动许多因果发现算法的条件独立性性质。因此,现有的因果发现算法不适用于多来源设置。我们证明,若混杂具有有界基数(即数据来自有限数量的来源),因果发现仍可实现。该问题的可行性受全局混杂的基数、观测变量的基数以及因果结构稀疏性之间的权衡制约。