Many real-world decision-making tasks require learning causal relationships between a set of variables. Traditional causal discovery methods, however, require that all variables are observed, which is often not feasible in practical scenarios. Without additional assumptions about the unobserved variables, it is not possible to recover any causal relationships from observational data. Fortunately, in many applied settings, additional structure among the confounders can be expected. In particular, pervasive confounding is commonly encountered and has been utilized for consistent causal estimation in linear causal models. In this paper, we present a provably consistent method to estimate causal relationships in the non-linear, pervasive confounding setting. The core of our procedure relies on the ability to estimate the confounding variation through a simple spectral decomposition of the observed data matrix. We derive a DAG score function based on this insight, prove its consistency in recovering a correct ordering of the DAG, and empirically compare it to previous approaches. We demonstrate improved performance on both simulated and real datasets by explicitly accounting for both confounders and non-linear effects.
翻译:许多现实世界中的决策任务需要学习一组变量之间的因果关系。然而,传统的因果发现方法要求所有变量均可观测,这在实际场景中往往难以实现。若不对未观测变量附加额外假设,则无法从观测数据中恢复任何因果关系。幸运的是,在许多应用情境中,混杂因素之间可预期的存在额外结构。特别是,普遍混杂现象常见,且已被用于线性因果模型中的一致性因果估计。本文提出一种在非线性、普遍混杂场景下可证明一致的因果关系估计方法。该方法的核心依赖于通过观测数据矩阵的简单谱分解来估计混杂变异。基于这一洞见,我们推导出一个DAG评分函数,证明其在恢复DAG正确排序方面的一致性,并将其与先前方法进行实证比较。通过明确考虑混杂因素与非线性效应,我们在模拟数据集与真实数据集上均展现出更优的性能。