Learning representations that generalize under distribution shifts is critical for building robust machine learning models. However, despite significant efforts in recent years, algorithmic advances in this direction have been limited. In this work, we seek to understand the fundamental difficulty of out-of-distribution generalization with deep neural networks. We first empirically show that perhaps surprisingly, even allowing a neural network to explicitly fit the representations obtained from a teacher network that can generalize out-of-distribution is insufficient for the generalization of the student network. Then, by a theoretical study of two-layer ReLU networks optimized by stochastic gradient descent (SGD) under a structured feature model, we identify a fundamental yet unexplored feature learning proclivity of neural networks, feature contamination: neural networks can learn uncorrelated features together with predictive features, resulting in generalization failure under distribution shifts. Notably, this mechanism essentially differs from the prevailing narrative in the literature that attributes the generalization failure to spurious correlations. Overall, our results offer new insights into the non-linear feature learning dynamics of neural networks and highlight the necessity of considering inductive biases in out-of-distribution generalization.
翻译:学习在分布变化下具有泛化能力的表示对于构建稳健的机器学习模型至关重要。然而,尽管近年来付出了巨大努力,该方向的算法进展仍然有限。在本工作中,我们试图理解深度神经网络在分布外泛化方面的根本困难。我们首先通过实验表明,一个或许令人惊讶的现象是:即使允许神经网络显式拟合来自一个能够在分布外泛化的教师网络所获得的表示,也不足以使学生网络实现泛化。随后,通过对在结构化特征模型下通过随机梯度下降(SGD)优化的两层ReLU网络进行理论研究,我们识别出神经网络一种根本性但尚未被充分探索的特征学习倾向——特征污染:神经网络可能同时学习无关特征与预测性特征,从而导致在分布变化下的泛化失败。值得注意的是,这一机制与文献中普遍将泛化失败归因于虚假相关性的主流解释存在本质区别。总体而言,我们的研究结果为神经网络非线性特征学习动态提供了新的见解,并强调了在分布外泛化问题中考虑归纳偏置的必要性。