We propose a novel unsupervised framework for \emph{Invariant Risk Minimization} (IRM), extending the concept of invariance to settings where labels are unavailable. Traditional IRM methods rely on labeled data to learn representations that are robust to distributional shifts across environments. In contrast, our approach redefines invariance through feature distribution alignment, enabling robust representation learning from unlabeled data. We introduce two methods within this framework: Principal Invariant Component Analysis (PICA), a linear method that extracts invariant directions under Gaussian assumptions, and Variational Invariant Autoencoder (VIAE), a deep generative model that separates environment-invariant and environment-dependent latent factors. Our approach is based on a novel ``unsupervised'' structural causal model and supports environment-conditioned sample-generation and intervention. Empirical evaluations on synthetic dataset, modified versions of MNIST, and CelebA demonstrate the effectiveness of our methods in capturing invariant structure, preserving relevant information, and generalizing across environments without access to labels.
翻译:我们提出了一种新颖的 \emph{不变风险最小化} 无监督框架,将不变性概念扩展到标签不可用的场景。传统IRM方法依赖标注数据来学习对跨环境分布变化具有鲁棒性的表示。相比之下,我们的方法通过特征分布对齐重新定义不变性,从而能够从无标签数据中进行鲁棒的表示学习。我们在此框架内引入了两种方法:主不变成分分析——一种在高斯假设下提取不变方向的线性方法;以及变分不变自编码器——一种分离环境不变与环境依赖潜在因子的深度生成模型。我们的方法基于一种新颖的“无监督”结构因果模型,并支持环境条件样本生成与干预。在合成数据集、修改版MNIST和CelebA上的实证评估表明,我们的方法在捕捉不变结构、保留相关信息以及无需标签即可跨环境泛化方面具有显著效果。