Causal representation learning has emerged as the center of action in causal machine learning research. In particular, multi-domain datasets present a natural opportunity for showcasing the advantages of causal representation learning over standard unsupervised representation learning. While recent works have taken crucial steps towards learning causal representations, they often lack applicability to multi-domain datasets due to over-simplifying assumptions about the data; e.g. each domain comes from a different single-node perfect intervention. In this work, we relax these assumptions and capitalize on the following observation: there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains; this property holds when, for example, each domain comes from a multi-node imperfect intervention. Leveraging this observation, we show that autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest across different settings.
翻译:因果表征学习已成为因果机器学习研究的核心焦点。特别地,多域数据集为展示因果表征学习相对于标准无监督表征学习的优势提供了天然契机。尽管近期研究在因果表征学习方面取得了关键进展,但由于对数据做出过度简化的假设(例如,每个域来自不同单节点完美干预),这些方法往往难以应用于多域数据集。在本工作中,我们放宽这些假设,并利用以下观察结果:通常存在一组潜变量,其某些分布属性(如支撑集、方差)在各域间保持稳定;例如,当每个域来自多节点不完全干预时,该性质成立。基于这一观察,我们证明,融入此类不变性的自编码器能够可证明地从不同设置下的其他潜变量中识别出稳定的潜变量集合。