Neural networks have shown remarkable success, especially in overparameterized or "large" models. Despite increasing empirical evidence and intuitive understanding, a formal mathematical justification for the behavior of such models, particularly regarding overfitting, remains incomplete. In this paper, we propose a general regularization framework to study the Mean Integrated Squared Error (MISE) of neural networks. This framework includes many commonly used neural networks and penalties, such as ReLu and Sigmoid activations and $L^1$, $L^2$ penalties. Based on our frameworks, we find the MISE curve has two possible shapes, namely the shape of double descents and monotone decreasing. The latter phenomenon is new in literature and the causes of these two phenomena are also studied in theory. These studies challenge conventional statistical modeling frameworks and broadens recent findings on the double descent phenomenon in neural networks.
翻译:神经网络,特别是过参数化或“大规模”模型,已展现出显著的成功。尽管经验证据和直观理解日益增多,但对此类模型行为(尤其是过拟合方面)的形式化数学论证仍不完整。本文提出一个通用的正则化框架来研究神经网络的均方积分误差(MISE)。该框架涵盖了许多常用的神经网络和惩罚项,例如ReLu和Sigmoid激活函数以及$L^1$、$L^2$惩罚。基于我们的框架,我们发现MISE曲线具有两种可能的形态,即双重下降形态和单调递减形态。后一种现象在文献中是新的,并且这两种现象的成因也在理论上得到了研究。这些研究挑战了传统的统计建模框架,并拓展了近期关于神经网络中双重下降现象的发现。