Causal representation learning aims at identifying high-level causal variables from perceptual data. Most methods assume that all latent causal variables are captured in the high-dimensional observations. We instead consider a partially observed setting, in which each measurement only provides information about a subset of the underlying causal state. Prior work has studied this setting with multiple domains or views, each depending on a fixed subset of latents. Here, we focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern. Our main contribution is to establish two identifiability results for this setting: one for linear mixing functions without parametric assumptions on the underlying causal model, and one for piecewise linear mixing functions with Gaussian latent causal variables. Based on these insights, we propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation. Experiments on different simulated datasets and established benchmarks highlight the effectiveness of our approach in recovering the ground-truth latents.
翻译:因果表示学习旨在从感知数据中识别高层因果变量。大多数方法假设所有潜在因果变量均在高维观测中被捕获。我们则考虑部分可观测设定,其中每次测量仅提供关于底层因果状态子集的信息。先前研究通过多领域或多视角(每个视角依赖于固定的潜在变量子集)探讨了此设定。本文聚焦于从具有实例依赖性部分可观测模式的数据集中,利用非配对观测进行学习。我们的主要贡献是为此设定建立两个可识别性结果:其一针对线性混合函数(无需对底层因果模型作参数化假设),其二针对具有高斯潜在因果变量的分段线性混合函数。基于这些发现,我们提出两种通过强制推断表示中的稀疏性来估计底层因果变量的方法。在不同模拟数据集和基准测试上的实验证明了我们方法在恢复真实潜在变量方面的有效性。