Learning models whose predictions are invariant under multiple environments is a promising approach for out-of-distribution generalization. Such models are trained to extract features $X_{\text{inv}}$ where the conditional distribution $Y \mid X_{\text{inv}}$ of the label given the extracted features does not change across environments. Invariant models are also supposed to generalize to shifts in the marginal distribution $p(X_{\text{inv}})$ of the extracted features $X_{\text{inv}}$, a type of shift we call an $\textit{invariant covariate shift}$. However, we show that proposed methods for learning invariant models underperform under invariant covariate shift, either failing to learn invariant models$\unicode{x2014}$even for data generated from simple and well-studied linear-Gaussian models$\unicode{x2014}$or having poor finite-sample performance. To alleviate these problems, we propose $\textit{weighted risk invariance}$ (WRI). Our framework is based on imposing invariance of the loss across environments subject to appropriate reweightings of the training examples. We show that WRI provably learns invariant models, i.e. discards spurious correlations, in linear-Gaussian settings. We propose a practical algorithm to implement WRI by learning the density $p(X_{\text{inv}})$ and the model parameters simultaneously, and we demonstrate empirically that WRI outperforms previous invariant learning methods under invariant covariate shift.
翻译:学习在多种环境下预测保持不变的模型是实现分布外泛化的一种有前景的方法。此类模型被训练以提取特征$X_{\text{inv}}$,使得给定所提取特征的条件分布$Y \mid X_{\text{inv}}$在不同环境间保持不变。不变模型还应能泛化到所提取特征$X_{\text{inv}}$的边缘分布$p(X_{\text{inv}})$发生偏移的情况,我们将此类偏移称为$\textit{不变协变量偏移}$。然而,我们发现现有学习不变模型的方法在不变协变量偏移下表现不佳,要么无法学习到不变模型——即使对于从简单且经过充分研究的线性高斯模型生成的数据——要么具有较差的有限样本性能。为缓解这些问题,我们提出了$\textit{加权风险不变性}$(WRI)。我们的框架基于在适当重加权训练样本的条件下,强制要求损失函数在不同环境间保持不变。我们证明,在线性高斯设置中,WRI可证明地学习到不变模型,即丢弃虚假相关性。我们提出了一种通过同时学习密度$p(X_{\text{inv}})$和模型参数来实现WRI的实用算法,并通过实验证明,在不变协变量偏移下,WRI优于先前的不变学习方法。