Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.
翻译:因果解缠旨在利用通过因果模型相互关联的潜在变量揭示数据的表示。当解释数据的潜在模型唯一时,该表示具有可辨识性。本文聚焦于可获得非成对观测数据与干预数据的场景,其中每次干预改变一个潜在变量的机制。在因果变量完全可观测的情况下,基于忠实性假设已开发出统计一致的算法来识别因果模型。本文证明,在广义忠实性概念下,即使因果变量不可观测,仍可实现可辨识性。我们的结果保证了在无限数据极限下,能够恢复至等价类程度的潜在因果模型,并预测未观测干预组合的效果。我们通过开发自编码变分贝叶斯算法实现因果解缠框架,并将其应用于基因组学中组合性扰动效应预测问题。