Independent Component Analysis (ICA) aims to recover independent latent variables from observed mixtures thereof. Causal Representation Learning (CRL) aims instead to infer causally related (thus often statistically dependent) latent variables, together with the unknown graph encoding their causal relationships. We introduce an intermediate problem termed Causal Component Analysis (CauCA). CauCA can be viewed as a generalization of ICA, modelling the causal dependence among the latent components, and as a special case of CRL. In contrast to CRL, it presupposes knowledge of the causal graph, focusing solely on learning the unmixing function and the causal mechanisms. Any impossibility results regarding the recovery of the ground truth in CauCA also apply for CRL, while possibility results may serve as a stepping stone for extensions to CRL. We characterize CauCA identifiability from multiple datasets generated through different types of interventions on the latent causal variables. As a corollary, this interventional perspective also leads to new identifiability results for nonlinear ICA -- a special case of CauCA with an empty graph -- requiring strictly fewer datasets than previous results. We introduce a likelihood-based approach using normalizing flows to estimate both the unmixing function and the causal mechanisms, and demonstrate its effectiveness through extensive synthetic experiments in the CauCA and ICA setting.
翻译:独立成分分析(ICA)旨在从观测到的混合信号中恢复独立的潜在变量。因果表示学习(CRL)则试图推断具有因果关联(因而通常存在统计依赖)的潜在变量,并同时学习编码其因果关系的未知图结构。我们提出一个中间问题,称为因果成分分析(CauCA)。CauCA可被视为ICA的泛化形式,对潜在成分间的因果依赖关系进行建模,同时也是CRL的一个特例。与CRL不同,CauCA预设已知因果图,仅专注于学习解混函数和因果机制。CauCA中关于无法恢复真实结果的任何不可能性结论同样适用于CRL,而可能性结果则可作为拓展至CRL的基石。我们刻画了通过对潜在因果变量施加不同类型干预所生成的多数据集下CauCA的可识别性特征。作为推论,这种干预视角也为非线性ICA(即具有空图的CauCA特例)带来了新的可识别性结论,所需数据集数量严格少于此前方法。我们引入一种基于归一化流的似然方法,用于同时估计解混函数和因果机制,并通过在CauCA与ICA设定下的大量合成实验验证了其有效性。