Domain generalization asks for models trained over a set of training environments to generalize well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) have been proposed for domain generalization. However, Rosenfeld et al. (2021) shows that in a simple linear data model, even if non-convexity issues are ignored, IRM and its extensions cannot generalize to unseen environments with less than $d_s+1$ training environments, where $d_s$ is the dimension of the spurious-feature subspace. In this work, we propose Invariant-feature Subspace Recovery (ISR): a new class of algorithms to achieve provable domain generalization across the settings of classification and regression problems. First, in the binary classification setup of Rosenfeld et al. (2021), we show that our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with $d_s+1$ training environments. Our second algorithm, ISR-Cov, further reduces the required number of training environments to $O(1)$ using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Next, we extend ISR-Mean to the more general setting of multi-class classification and propose ISR-Multiclass, which leverages class information and provably recovers the invariant-feature subspace with $\lceil d_s/k\rceil+1$ training environments for $k$-class classification. Finally, for regression problems, we propose ISR-Regression that can identify the invariant-feature subspace with $d_s+1$ training environments. Empirically, we demonstrate the superior performance of our ISRs on synthetic benchmarks. Further, ISR can be used as post-processing methods for feature extractors such as neural nets.
翻译:域泛化旨在要求模型在多个训练环境下训练后,能在未见过的测试环境下保持良好的泛化能力。近年来,一系列算法如不变风险最小化(IRM)被提出用于域泛化。然而,Rosenfeld等人(2021)指出,在简单线性数据模型中,即使忽略非凸性问题,IRM及其扩展方法也需要至少$d_s+1$个训练环境才能泛化到未见环境,其中$d_s$为虚假特征子空间的维度。本研究提出不变特征子空间恢复(ISR):一类新的算法体系,可在分类与回归问题中实现可证明的域泛化。首先,在Rosenfeld等人(2021)的二元分类设定中,我们的首个算法ISR-Mean利用类别条件分布的一阶矩可识别不变特征张成的子空间,仅需$d_s+1$个训练环境即可实现可证明域泛化。第二个算法ISR-Cov进一步利用二阶矩信息,将所需训练环境数量降低至$O(1)$。值得注意的是,与IRM不同,我们的算法可规避非凸性问题并具有全局收敛保证。其次,我们将ISR-Mean扩展至更普适的多类分类场景,提出ISR-Multiclass算法,该算法利用类别信息,对$k$类分类问题仅需$\lceil d_s/k\rceil+1$个训练环境即可可证明地恢复不变特征子空间。最后,针对回归问题,我们提出ISR-Regression算法,仅需$d_s+1$个训练环境即可识别不变特征子空间。实验表明,我们的ISR算法在合成基准测试中展现出优越性能。此外,ISR可作为神经网络等特征提取器的后处理方法使用。