Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a stepping stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA -- a special case of CauCA with an empty graph -- requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting.
翻译:独立成分分析(ICA)旨在从观测到的混合信号中恢复独立的潜在变量。因果表示学习(CRL)则致力于推断具有因果关联(因此通常具有统计依赖性)的潜在变量,以及编码其因果关系的未知图结构。我们提出一个称为因果成分分析(CauCA)的中间问题。CauCA可被视为ICA的推广,用于建模潜在成分间的因果依赖关系,同时又是CRL的一个特例。与CRL不同,它预设因果图已知,仅专注于学习解混函数与因果机制。CauCA中关于无法恢复真实情况的不可行性结论同样适用于CRL,而可行性结果则可作为向CRL扩展的基石。我们刻画了通过不同类型干预作用于潜在因果变量所生成的多数据集下CauCA的可辨识性特征。作为推论,这一干预视角还推导出非线性ICA(即空图情形下的CauCA特例)新的可辨识性结论,所需数据集数量严格少于以往结果。我们提出一种基于归一化流的似然方法,用于同时估计解混函数与因果机制,并通过大量CauCA与ICA场景下的合成实验验证其有效性。