Simplicity bias, the propensity of deep models to over-rely on simple features, has been identified as a potential reason for limited out-of-distribution generalization of neural networks (Shah et al., 2020). Despite the important implications, this phenomenon has been theoretically confirmed and characterized only under strong dataset assumptions, such as linear separability (Lyu et al., 2021). In this work, we characterize simplicity bias for general datasets in the context of two-layer neural networks initialized with small weights and trained with gradient flow. Specifically, we prove that in the early training phases, network features cluster around a few directions that do not depend on the size of the hidden layer. Furthermore, for datasets with an XOR-like pattern, we precisely identify the learned features and demonstrate that simplicity bias intensifies during later training stages. These results indicate that features learned in the middle stages of training may be more useful for OOD transfer. We support this hypothesis with experiments on image data.
翻译:简单性偏好,即深度模型过度依赖简单特征的倾向,已被认为是神经网络分布外泛化能力受限的潜在原因(Shah等人,2020)。尽管这一现象具有重要影响,但此前仅在强数据集假设(如线性可分性)下得到了理论证实与刻画(Lyu等人,2021)。本工作针对一般数据集,在小权重初始化且采用梯度流训练的双层神经网络背景下,刻画了简单性偏好。具体而言,我们证明了在训练初期阶段,网络特征会聚集于少数几个不依赖于隐藏层大小的方向。此外,对于具有类异或模式的数据集,我们精确识别了所学特征,并证明简单性偏好在训练后期阶段会加剧。这些结果表明,训练中期阶段学习到的特征可能对分布外迁移更具价值。我们通过图像数据实验验证了这一假设。