Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a stepping stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA -- a special case of CauCA with an empty graph -- requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting.
翻译:独立成分分析(ICA)旨在从观测到的混合信号中恢复独立的潜在变量。因果表征学习(CRL)则致力于推断具有因果关系(因而通常具有统计依赖关系)的潜在变量,以及编码其因果关系的未知图结构。我们引入一个名为因果成分分析(CauCA)的中间问题。CauCA可被视为ICA的泛化形式,对潜在成分间的因果依赖关系进行建模,同时也是CRL的一个特例。与CRL不同,CauCA预先假设因果图已知,仅专注于学习解混函数与因果机制。CauCA中关于真实变量恢复的任何不可能性结果同样适用于CRL,而可能性结果则可作为向CRL扩展的铺垫。我们刻画了通过多种不同类型干预生成的多个数据集下CauCA的可识别性条件。作为推论,这一干预视角还导出了非线性ICA(即空图情形下CauCA的特例)的新可识别性结果,且所需数据集数量严格少于先前工作。我们引入了一种基于归一化流的似然方法,以同时估计解混函数与因果机制,并通过大量针对CauCA与ICA设定的合成实验验证了其有效性。