Causal representation learning has attracted significant research interest during the past few years, as a means for improving model generalization and robustness. Causal representations of interventional image pairs (also called ``actionable counterfactuals'' in the literature), have the property that only variables corresponding to scene elements affected by the intervention / action are changed between the start state and the end state. While most work in this area has focused on identifying and representing the variables of the scene under a causal model, fewer efforts have focused on representations of the interventions themselves. In this work, we show that an effective strategy for improving out of distribution (OOD) robustness is to focus on the representation of actionable counterfactuals in the latent space. Specifically, we propose that an intervention can be represented by a Causal Delta Embedding that is invariant to the visual scene and sparse in terms of the causal variables it affects. Leveraging this insight, we propose a method for learning causal representations from image pairs, without any additional supervision. Experiments in the Causal Triplet challenge demonstrate that Causal Delta Embeddings are highly effective in OOD settings, significantly exceeding baseline performance in both synthetic and real-world benchmarks.
翻译:因果表示学习作为提升模型泛化能力与鲁棒性的重要途径,在过去几年中吸引了广泛的研究关注。干预图像对(在文献中常称为“可操作反事实”)的因果表示具有如下特性:在起始状态与终止状态之间,仅与干预/动作所影响的场景要素对应的变量发生变化。尽管该领域多数研究聚焦于在因果模型下识别并表征场景变量,但针对干预本身表示的研究相对较少。本研究表明,提升分布外(OOD)鲁棒性的有效策略在于关注潜在空间中可操作反事实的表征。具体而言,我们提出干预可通过因果Delta嵌入进行表征,该嵌入对视觉场景具有不变性,且在其影响的因果变量维度上具有稀疏性。基于这一洞见,我们提出一种无需额外监督即可从图像对中学习因果表示的方法。在Causal Triplet挑战中的实验表明,因果Delta嵌入在OOD场景中具有显著优势,在合成与真实世界基准测试中均大幅超越基线性能。