The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models, which is easy to describe: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models that generalize well even when the labels are noisy . The phase transition phenomenon is confirmed through numerical experiments.
翻译:深度学习实践表明,即使学习参数数量极其庞大,神经网络仍能实现出色的泛化性能。这似乎与传统统计理论相悖——后者强调模型复杂度与数据拟合之间的必要权衡。我们旨在通过采用凸优化和稀疏恢复视角来阐释这一矛盾。本文研究了具有标准权重衰减正则化的两层ReLU网络的训练与泛化特性。在数据的特定正则性假设下,我们证明:无论参数数量如何,ReLU网络仅学习能解释数据的简单模型,这类似于压缩感知中恢复最稀疏线性模型的过程。针对ReLU网络及其带有跳跃连接或归一化层的变体,我们提出了确保植入神经元精确恢复的等距条件。对于随机生成的数据,我们展示了恢复植入神经网络模型时存在的相变现象,其描述简明:当样本数与维度之比超过数值阈值时,恢复以高概率成功;反之则高概率失败。令人惊讶的是,即使标签存在噪声,ReLU网络仍能学习具有良好泛化性的简单稀疏模型。数值实验验证了这一相变现象。