With observational data alone, causal structure learning is a challenging problem. The task becomes easier when having access to data collected from perturbations of the underlying system, even when the nature of these is unknown. Existing methods either do not allow for the presence of latent variables or assume that these remain unperturbed. However, these assumptions are hard to justify if the nature of the perturbations is unknown. We provide results that enable scoring causal structures in the setting with additive, but unknown interventions. Specifically, we propose a maximum-likelihood estimator in a structural equation model that exploits system-wide invariances to output an equivalence class of causal structures from perturbation data. Furthermore, under certain structural assumptions on the population model, we provide a simple graphical characterization of all the DAGs in the interventional equivalence class. We illustrate the utility of our framework on synthetic data as well as real data involving California reservoirs and protein expressions. The software implementation is available as the Python package \emph{utlvce}.
翻译:仅通过观测数据进行因果结构学习是一项具有挑战性的问题。若能获取来自底层系统扰动(即使这些扰动的性质未知)的数据,该任务将变得相对容易。现有方法要么不允许存在潜变量,要么假设潜变量未受扰动。然而,当扰动性质未知时,这些假设难以成立。我们提出了在加性但未知干预条件下对因果结构进行评分的方法。具体而言,我们构建了一个结构方程模型中的最大似然估计器,该估计器利用系统层面的不变性,从扰动数据中输出因果结构的等价类。此外,在总体模型的特定结构性假设下,我们给出了干预等价类中所有有向无环图的简洁图论特征。我们在合成数据以及涉及加州水库与蛋白质表达的真实数据上验证了该框架的有效性。软件实现已发布为Python包 \emph{utlvce}。