The goal of causal representation learning is to find a representation of data that consists of causally related latent variables. We consider a setup where one has access to data from multiple domains that potentially share a causal representation. Crucially, observations in different domains are assumed to be unpaired, that is, we only observe the marginal distribution in each domain but not their joint distribution. In this paper, we give sufficient conditions for identifiability of the joint distribution and the shared causal graph in a linear setup. Identifiability holds if we can uniquely recover the joint distribution and the shared causal representation from the marginal distributions in each domain. We transform our identifiability results into a practical method to recover the shared latent causal graph. Moreover, we study how multiple domains reduce errors in falsely detecting shared causal variables in the finite data setting.
翻译:因果表征学习的目标是找到由因果相关潜变量组成的数据表征。我们考虑一个设置,其中可访问来自多个可能共享因果表征领域的数据。关键的是,不同领域的观测被假定为无配对,即我们仅能观测每个领域的边际分布,而非其联合分布。本文在线性设置下给出了联合分布与共享因果图可辨识性的充分条件。可辨识性成立的条件是,我们能够从每个领域的边际分布中唯一恢复联合分布与共享因果表征。我们将可辨识性结果转化为恢复共享潜在因果图的实用方法。此外,我们研究了在有限数据设置下,多个领域如何减少错误检测共享因果变量的误差。