Causal representation learning seeks to extract high-level latent factors from low-level sensory data. Most existing methods rely on observational data and structural assumptions (e.g., conditional independence) to identify the latent factors. However, interventional data is prevalent across applications. Can interventional data facilitate causal representation learning? We explore this question in this paper. The key observation is that interventional data often carries geometric signatures of the latent factors' support (i.e. what values each latent can possibly take). For example, when the latent factors are causally connected, interventions can break the dependency between the intervened latents' support and their ancestors'. Leveraging this fact, we prove that the latent causal factors can be identified up to permutation and scaling given data from perfect $do$ interventions. Moreover, we can achieve block affine identification, namely the estimated latent factors are only entangled with a few other latents if we have access to data from imperfect interventions. These results highlight the unique power of interventional data in causal representation learning; they can enable provable identification of latent factors without any assumptions about their distributions or dependency structure.
翻译:因果表征学习旨在从低层感官数据中提取高层潜在因子。现有方法大多依赖观测数据和结构假设(如条件独立性)来识别潜在因子。然而,介入式数据在各类应用中广泛存在。介入式数据能否促进因果表征学习?本文对此问题进行探究。关键发现是:介入式数据通常携带潜在因子支撑集(即各潜在因子可能取值的范围)的几何特征。例如,当潜在因子存在因果关联时,干预会切断被干预潜在因子支撑集与其祖先因子之间的依赖关系。基于这一事实,我们证明:在完美$do$干预数据条件下,潜在因果因子可被识别至排列和缩放变换。此外,若获得非完美干预数据,可实现块仿射识别,即估计的潜在因子仅与少数其他潜在因子存在纠缠。这些结果凸显了介入式数据在因果表征学习中的独特能力——无需对潜在因子的分布或依赖结构作任何假设,即可实现可证明的因子识别。