Neural networks (NNs) are central to modern machine learning and achieve state-of-the-art results in many applications. However, the relationship between loss geometry and generalization is still not well understood. The local geometry of the loss function near a critical point is well-approximated by its quadratic form, obtained through a second-order Taylor expansion. The coefficients of the quadratic term correspond to the Hessian matrix, whose eigenspectrum allows us to evaluate the sharpness of the loss at the critical point. Extensive research suggests flat critical points generalize better, while sharp ones lead to higher generalization error. However, sharpness requires the Hessian eigenspectrum, but general matrix characteristic equations have no closed-form solution. Therefore, most existing studies on evaluating loss sharpness rely on numerical approximation methods. Existing closed-form analyses of the eigenspectrum are primarily limited to simplified architectures, such as linear or ReLU-activated networks; consequently, theoretical analysis of smooth nonlinear multilayer neural networks remains limited. Against this background, this study focuses on nonlinear, smooth multilayer neural networks and derives a closed-form upper bound for the maximum eigenvalue of the Hessian with respect to the cross-entropy loss by leveraging the Wolkowicz-Styan bound. Specifically, the derived upper bound is expressed as a function of the affine transformation parameters, hidden layer dimensions, and the degree of orthogonality among the training samples. The primary contribution of this paper is an analytical characterization of loss sharpness in smooth nonlinear multilayer neural networks via a closed-form expression, avoiding explicit numerical eigenspectrum computation. We hope that this work provides a small yet meaningful step toward unraveling the mysteries of deep learning.
翻译:神经网络是现代机器学习的核心,并在众多应用中取得了最先进的成果。然而,损失几何与泛化之间的关联仍未被充分理解。临界点附近损失函数的局部几何可通过其二阶泰勒展开得到的二次型来良好近似。二次项系数对应Hessian矩阵,其特征谱使我们能够评估损失函数在临界点处的尖锐程度。大量研究表明,平坦临界点具有更优泛化性能,而尖锐临界点则导致更高的泛化误差。然而,尖锐程度需依赖Hessian特征谱,但一般矩阵特征方程无闭式解。因此,现有评估损失尖锐程度的研究多采用数值近似方法。现有关于特征谱的闭式分析主要局限于简化架构(如线性或ReLU激活网络),这使得光滑非线性多层神经网络的理论分析仍较为有限。在此背景下,本研究聚焦于非线性光滑多层神经网络,通过运用Wolkowicz-Styan界推导出关于交叉熵损失下Hessian最大特征值的闭式上界。具体而言,该上界表示为仿射变换参数、隐藏层维度以及训练样本间正交程度的函数。本文的核心贡献在于通过闭式表达对光滑非线性多层神经网络的损失尖锐程度进行解析刻画,从而避免显式的数值特征谱计算。我们期望这项工作能为揭开深度学习的神秘面纱提供微小但具有意义的一步。