Causal representation learning has emerged as the center of action in causal machine learning research. In particular, multi-domain datasets present a natural opportunity for showcasing the advantages of causal representation learning over standard unsupervised representation learning. While recent works have taken crucial steps towards learning causal representations, they often lack applicability to multi-domain datasets due to over-simplifying assumptions about the data; e.g. each domain comes from a different single-node perfect intervention. In this work, we relax these assumptions and capitalize on the following observation: there often exists a subset of latents whose certain distributional properties (e.g., support, variance) remain stable across domains; this property holds when, for example, each domain comes from a multi-node imperfect intervention. Leveraging this observation, we show that autoencoders that incorporate such invariances can provably identify the stable set of latents from the rest across different settings.
翻译:因果表示学习已成为因果机器学习研究的核心。特别是,多领域数据集为展示因果表示学习相较于标准无监督表示学习的优势提供了天然机遇。尽管近期研究在因果表示学习方面取得了关键进展,但由于对数据做出了过度简化的假设(例如,每个领域来自不同的单节点完美干预),这些方法往往缺乏对多领域数据集的适用性。在本工作中,我们放宽了这些假设,并基于以下观察提出新方法:通常存在一个潜在变量子集,其某些分布特性(如支撑集、方差)在不同领域中保持稳定;当每个领域来自多节点不完美干预时,这一性质成立。利用这一观察,我们证明了融入此类不变性的自编码器能够可靠地从其他潜在变量中识别出稳定潜在变量子集,适用于不同场景。