A common explanation for the failure of out-of-distribution (OOD) generalization is that the model trained with empirical risk minimization (ERM) learns spurious features instead of invariant features. However, several recent studies challenged this explanation and found that deep networks may have already learned sufficiently good features for OOD generalization. Despite the contradictions at first glance, we theoretically show that ERM essentially learns both spurious and invariant features, while ERM tends to learn spurious features faster if the spurious correlation is stronger. Moreover, when fed the ERM learned features to the OOD objectives, the invariant feature learning quality significantly affects the final OOD performance, as OOD objectives rarely learn new features. Therefore, ERM feature learning can be a bottleneck to OOD generalization. To alleviate the reliance, we propose Feature Augmented Training (FeAT), to enforce the model to learn richer features ready for OOD generalization. FeAT iteratively augments the model to learn new features while retaining the already learned features. In each round, the retention and augmentation operations are performed on different subsets of the training data that capture distinct features. Extensive experiments show that FeAT effectively learns richer features thus boosting the performance of various OOD objectives.
翻译:分布外(OOD)泛化失败的一个常见解释是,通过经验风险最小化(ERM)训练的模型学习了伪特征而非不变特征。然而,近期一些研究对此解释提出了质疑,发现深度网络可能已经学习了足够好的特征用于OOD泛化。尽管初看起来存在矛盾,我们理论上证明了ERM本质上同时学习了伪特征和不变特征,但若伪相关性更强,ERM倾向于更快地学习伪特征。此外,当将ERM学习的特征输入OOD目标函数时,不变特征的学习质量对最终OOD性能有显著影响,因为OOD目标函数很少学习新特征。因此,ERM特征学习可能成为OOD泛化的瓶颈。为缓解这一依赖,我们提出特征增强训练(FeAT),强制模型学习更丰富的特征以应对OOD泛化。FeAT通过迭代方式增强模型学习新特征,同时保留已学习特征。在每一轮中,保留和增强操作在捕获不同特征的训练数据子集上执行。大量实验表明,FeAT有效学习更丰富特征,从而提升多种OOD目标函数的性能。