Linear causal models are important tools for modeling causal dependencies and yet in practice, only a subset of the variables can be observed. In this paper, we examine the parameter identifiability of these models by investigating whether the edge coefficients can be recovered given the causal structure and partially observed data. Our setting is more general than that of prior research - we allow all variables, including both observed and latent ones, to be flexibly related, and we consider the coefficients of all edges, whereas most existing works focus only on the edges between observed variables. Theoretically, we identify three types of indeterminacy for the parameters in partially observed linear causal models. We then provide graphical conditions that are sufficient for all parameters to be identifiable and show that some of them are provably necessary. Methodologically, we propose a novel likelihood-based parameter estimation method that addresses the variance indeterminacy of latent variables in a specific way and can asymptotically recover the underlying parameters up to trivial indeterminacy. Empirical studies on both synthetic and real-world datasets validate our identifiability theory and the effectiveness of the proposed method in the finite-sample regime.
翻译:线性因果模型是建模因果依赖关系的重要工具,然而在实践中,往往只能观测到变量的一个子集。本文通过研究在给定因果结构和部分观测数据的情况下,能否恢复边系数,来考察此类模型的参数可识别性。我们的设定比先前研究更为一般化——我们允许所有变量(包括观测变量和潜变量)之间存在灵活的关系,并且我们考虑所有边的系数,而现有工作大多仅关注观测变量之间的边。理论上,我们识别了部分观测线性因果模型中参数存在的三种不确定性类型。随后,我们给出了足以使所有参数可识别的图条件,并证明其中一些条件在理论上也是必要的。在方法上,我们提出了一种新颖的基于似然的参数估计方法,该方法以一种特定的方式处理潜变量的方差不确定性,并能够渐近地恢复底层参数(至多存在平凡的不确定性)。在合成数据集和真实数据集上的实证研究验证了我们的可识别性理论以及所提方法在有限样本情况下的有效性。