Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The well known double descents phenomenon suggests that the test error curve of neural networks decreases monotonically as model size grows and eventually converges to a non-zero constant. This work aims to explain the theoretical mechanism underlying this tail behavior and study the statistical consistency of deep overparameterized neural networks in many different learning tasks including regression and classification. Firstly, we prove that as the number of parameters increases, the approximation error decreases monotonically, while explicit or implicit regularization (e.g., weight decay) keeps the generalization error existing but bounded. Consequently, the overall error curve eventually converges to a constant determined by the bounded generalization error and the optimization error. Secondly, we prove that deep overparameterized neural networks are statistical consistency across multiple learning tasks if regularization technique is used. Our theoretical findings coincide with numerical experiments and provide a perspective for understanding the generalization behavior of overparameterized neural networks.
翻译:尽管过参数化模型已在实践中取得显著成功,但其理论特性,特别是泛化行为,仍未得到完全理解。著名的双下降现象表明,神经网络的测试误差曲线随模型规模增大而单调递减,并最终收敛于非零常数。本研究旨在解释这种尾部行为的理论机制,并在包括回归与分类在内的多种学习任务中,探究深度过参数化神经网络的统计一致性。首先,我们证明随着参数数量增加,近似误差单调递减,而显式或隐式正则化(如权重衰减)使泛化误差存在但有界。因此,整体误差曲线最终收敛于由有界泛化误差和优化误差共同决定的常数。其次,我们证明在使用正则化技术的情况下,深度过参数化神经网络在多种学习任务中具有统计一致性。我们的理论发现与数值实验相吻合,为理解过参数化神经网络的泛化行为提供了新的视角。