Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.
翻译:因果解缠旨在通过一个因果模型相互关联的潜变量来揭示数据的表示。若解释数据的潜模型是唯一的,则这种表示是可识别的。本文聚焦于以下场景:可获得未配对的观测数据和干预数据,且每次干预都会改变某个潜变量的机制。当因果变量被完全观测时,已在忠实性假设下开发出统计一致的算法来识别因果模型。本文证明,在广义忠实性概念下,即使因果变量未被观测,可识别性仍然可以实现。我们的结果保证:在无限数据极限下,能够恢复潜因果模型直至某个等价类,并预测未见干预组合的效果。我们通过开发一种自编码变分贝叶斯算法来实现因果解缠框架,并将其应用于基因组学中组合扰动效应预测问题。