Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice. This has been highlighted recently by the benign overfitting phenomenon: when neural networks become sufficiently large to interpolate the dataset perfectly, model performance appears to improve with increasing model size, in apparent contradiction with the well-known bias-variance tradeoff. While such phenomena have proven challenging to theoretically study for general models, the recently proposed Interpolating Information Criterion (IIC) provides a valuable theoretical framework to examine performance for overparameterized models. Using the IIC, a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence generalization performance in the interpolating regime. From the provided bound, we quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, optimizer, and parameter-initialization scheme; the spectrum of the empirical neural tangent kernel; curvature of the loss landscape; and noise present in the data.
翻译:深度学习以其理论与实践的差距而闻名,即原则性理论通常无法为实际实现提供有益指导。最近良性过拟合现象进一步凸显了这一点:当神经网络足够大以完美插值数据集时,模型性能似乎随模型规模增大而提升,这与众所周知的偏差-方差权衡明显矛盾。虽然此类现象对通用模型的理论研究具有挑战性,但最近提出的插值信息准则(IIC)为检验过参数化模型的性能提供了宝贵的理论框架。利用IIC,我们为一般模型类别获得了PAC-贝叶斯界,刻画了插值机制中影响泛化性能的因素。根据所推导的界,我们量化了实现近乎零训练误差的过参数化模型的测试误差如何取决于:由模型、优化器和参数初始化方案等组合施加的隐式正则化质量;经验神经正切核的谱;损失景观曲率;以及数据中存在的噪声。