We study the generalization of ridge-regularized nonlinear least-squares models via on-average algorithmic stability, deriving error bounds for local minimizers in terms of a data-dependent effective dimension that reflects the geometry of the gradient model at the trained parameters, through the empirical Jacobian Gram matrix and a residual-curvature term. In the linear case, where the curvature term vanishes, this recovers the classical effective dimension of the Jacobian kernel covariance, but evaluated at the trained model rather than at initialization as is typical in neural tangent kernel analyses. We further bound this effective dimension via covering complexity of the gradient features, leading to guarantees that depend on learned geometry rather than parameter count. In particular, for manifold-supported data and piecewise Lipschitz Jacobians, the bounds scale with intrinsic dimension, while for one-hidden-layer ReLU networks, the mechanism can be made explicit through counts of activation-stable regions. Experiments on synthetic manifolds, clustered distributions, and benchmark datasets illustrate trained-Jacobian compression, the tightness of the residual-curvature linearization, and agreement between the stability bound and observed generalization gaps. A key feature of our bounds is the simplicity of their derivation, which follows from first principles using the Brascamp-Lieb inequality under strongly log-concave noise.
翻译:我们通过平均算法稳定性研究了岭正则化非线性最小二乘模型的泛化,为局部极小值推导了数据依赖有效维度下的误差界。该有效维度通过经验雅可比格拉姆矩阵和残差曲率项反映训练参数处梯度模型的几何结构。在线性情形下(曲率项消失),该方法恢复雅可比核协方差经典的有效维度,但评估点是训练模型而非神经正切核分析中典型的初始化点。我们进一步利用梯度特征覆盖复杂度约束该有效维度,从而得到依赖于学习几何而非参数数量的保证。特别地,对于流形支撑数据和逐段Lipschitz雅可比矩阵,该界与内在维度成比例;对于单隐层ReLU网络,该机制可通过激活稳定区域计数显式实现。合成流形、聚类分布和基准数据集上的实验展示了训练雅可比压缩、残差曲率线性化的紧致性,以及稳定性界与观测泛化差距的一致性。我们方法的核心优势在于推导的简洁性——这是基于强对数凹噪声下Brascamp-Lieb不等式的基本原理直接推导所得。