We introduce a methodology for analyzing neural networks through the lens of layer-wise Hessian matrices. The local Hessian of each functional block (layer) is defined as the matrix of second derivatives of a scalar function with respect to the parameters of that layer. This concept provides a formal tool for characterizing the local geometry of the parameter space. We show that the spectral properties of local Hessians, such as the distribution of eigenvalues, reveal quantitative patterns associated with overfitting, underparameterization, and expressivity in neural network architectures. We conduct an extensive empirical study involving 111 experiments across 37 datasets. The results demonstrate consistent structural regularities in the evolution of local Hessians during training and highlight correlations between their spectra and generalization performance. These findings establish a foundation for using local geometric analysis to guide the diagnosis and design of deep neural networks. The proposed framework connects optimization geometry with functional behavior and offers practical insight for improving network architectures and training stability.
翻译:本文提出一种通过层间Hessian矩阵分析神经网络的方法。每个功能块(层)的局部Hessian矩阵定义为标量函数对该层参数的二阶导数矩阵。该概念为刻画参数空间的局部几何特性提供了形式化工具。我们证明局部Hessian矩阵的谱特性(如特征值分布)能够揭示神经网络架构中过拟合、欠参数化与表达能力相关的量化模式。我们开展了涵盖37个数据集共111组实验的广泛实证研究。结果表明:局部Hessian矩阵在训练过程中呈现一致的结构规律性,其谱特性与泛化性能存在显著相关性。这些发现为运用局部几何分析指导深度神经网络的诊断与设计奠定了基础。所提出的框架将优化几何与函数行为相联系,为改进网络架构和训练稳定性提供了实践洞见。