Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perhaps surprisingly, even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network. Then, by a theoretical study of two-layer ReLU networks optimized by stochastic gradient descent (SGD) under a structured feature model, we identify a fundamental yet unexplored feature learning proclivity of neural networks, feature contamination: neural networks can learn uncorrelated features together with predictive features, resulting in generalization failure under distribution shifts. Notably, this mechanism essentially differs from the prevailing narrative in the literature that attributes the generalization failure to spurious correlations. Overall, our results offer new insights into the non-linear feature learning dynamics of neural networks and highlight the necessity of considering inductive biases in out-of-distribution generalization.
翻译:学习能够在分布偏移下泛化的表示对于构建稳健的机器学习模型至关重要。然而,尽管近年来付出了巨大努力,该方向上的算法进展仍然有限。在本工作中,我们试图理解深度神经网络在分布外泛化中的根本困难。我们首先通过实验表明,一个令人惊讶的现象是:即使允许神经网络显式拟合一个能够实现分布外泛化的教师网络所获得的表示,仍不足以使学生网络实现泛化。随后,通过对随机梯度下降优化的两层ReLU网络在结构化特征模型下的理论分析,我们识别出神经网络中一个基础但尚未被探索的特征学习倾向——特征污染:神经网络会将无关特征与预测特征一同学习,导致在分布偏移下泛化失败。值得注意的是,该机制与文献中将泛化失败归因于虚假关联的主流观点存在本质区别。总体而言,我们的结果为神经网络的非线性特征学习动态提供了新见解,并强调了在分布外泛化中考虑归纳偏置的必要性。