Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.
翻译:因果解缠旨在通过潜在变量揭示数据的表示,这些潜在变量通过因果模型相互关联。当解释数据的潜在模型唯一时,该表示是可识别的。本文聚焦于未配对的观测数据和干预数据可用的场景,其中每次干预改变一个潜在变量的机制。当因果变量完全可观测时,基于忠实性假设,已发展出统计一致的算法来识别因果模型。我们在此证明,在广义忠实性概念下,即使因果变量不可观测,仍可实现可识别性。我们的结果保证了在无限数据极限下,能够恢复潜在因果模型(直至等价类),并预测未见干预组合的效果。我们通过开发自编码变分贝叶斯算法实现因果解缠框架,并将其应用于基因组学中组合扰动效应预测问题。