Causal representation learning aims at identifying high-level causal variables from perceptual data. Most methods assume that all latent causal variables are captured in the high-dimensional observations. We instead consider a partially observed setting, in which each measurement only provides information about a subset of the underlying causal state. Prior work has studied this setting with multiple domains or views, each depending on a fixed subset of latents. Here, we focus on learning from unpaired observations from a dataset with an instance-dependent partial observability pattern. Our main contribution is to establish two identifiability results for this setting: one for linear mixing functions without parametric assumptions on the underlying causal model, and one for piecewise linear mixing functions with Gaussian latent causal variables. Based on these insights, we propose two methods for estimating the underlying causal variables by enforcing sparsity in the inferred representation. Experiments on different simulated datasets and established benchmarks highlight the effectiveness of our approach in recovering the ground-truth latents.
翻译:因果表示学习旨在从感知数据中识别高层因果变量。大多数方法假设所有潜在因果变量都被包含在高维观测中。我们则考虑部分可观测的场景,即每次测量仅提供关于潜在因果状态子集的信息。此前已有研究探讨了多视角或多域场景下的此类问题,其中每个视角或域依赖于固定子集的潜在变量。本文聚焦于从具有实例依赖的部分可观测模式的非配对观测数据中学习。我们的主要贡献是为该场景建立了两个可辨识性结果:其一针对线性混合函数,无需对潜在因果模型施加参数化假设;其二针对分段线性混合函数且潜在因果变量服从高斯分布的情况。基于这些发现,我们提出两种方法通过在推断表示中强制稀疏性来估计潜在因果变量。在不同模拟数据集和标准基准上的实验表明,我们的方法在恢复真实潜在变量方面具有显著效果。