Causal Discovery with Mixed Latent Confounding via Precision Decomposition

We study causal discovery from observational data in linear Gaussian systems affected by \emph{mixed latent confounding}, where some unobserved factors act broadly across many variables while others influence only small subsets. This setting is common in practice and poses a challenge for existing methods: differentiable and score-based DAG learners can misinterpret global latent effects as causal edges, while latent-variable graphical models recover only undirected structure. We propose \textsc{DCL-DECOR}, a modular, precision-led pipeline that separates these roles. The method first isolates pervasive latent effects by decomposing the observed precision matrix into a structured component and a low-rank component. The structured component corresponds to the conditional distribution after accounting for pervasive confounders and retains only local dependence induced by the causal graph and localized confounding. A correlated-noise DAG learner is then applied to this deconfounded representation to recover directed edges while modeling remaining structured error correlations, followed by a simple reconciliation step to enforce bow-freeness. We provide identifiability results that characterize the recoverable causal target under mixed confounding and show how the overall problem reduces to well-studied subproblems with modular guarantees. Synthetic experiments that vary the strength and dimensionality of pervasive confounding demonstrate consistent improvements in directed edge recovery over applying correlated-noise DAG learning directly to the confounded data.

翻译：本研究探讨线性高斯系统中受**混合潜在混杂**影响的观测数据因果发现问题，其中部分未观测因子广泛作用于多个变量，而其他因子仅影响小规模变量子集。该设定在实际应用中普遍存在，并对现有方法构成挑战：可微分与基于评分的DAG学习器可能将全局潜在效应误判为因果边，而潜变量图模型仅能恢复无向结构。我们提出**DCL-DECOR**——一种模块化、以精度矩阵为主导的处理流程，以区分这些不同作用机制。该方法首先通过将观测精度矩阵分解为结构化分量与低秩分量，从而分离普遍存在的潜在效应。结构化分量对应消除普遍混杂因子后的条件分布，仅保留由因果图及局部化混杂引起的局部依赖关系。随后将相关噪声DAG学习器应用于该去混杂表示，在建模剩余结构化误差相关性的同时恢复有向边，最后通过简易协调步骤确保弓形结构不存在性。我们提供了可识别性分析结果，刻画了混合混杂条件下可恢复的因果目标，并论证了如何将整体问题转化为具有模块化保证的成熟子问题。通过改变普遍混杂强度与维度的合成实验表明，相较于直接将相关噪声DAG学习应用于混杂数据，本方法在有向边恢复方面持续提升性能。