Empirical evidence suggests that for a variety of overparameterized nonlinear models, most notably in neural network training, the growth of the loss around a minimizer strongly impacts its performance. Flat minima -- those around which the loss grows slowly -- appear to generalize well. This work takes a step towards understanding this phenomenon by focusing on the simplest class of overparameterized nonlinear models: those arising in low-rank matrix recovery. We analyze overparameterized matrix and bilinear sensing, robust PCA, covariance matrix estimation, and single hidden layer neural networks with quadratic activation functions. In all cases, we show that flat minima, measured by the trace of the Hessian, exactly recover the ground truth under standard statistical assumptions. For matrix completion, we establish weak recovery, although empirical evidence suggests exact recovery holds here as well. We conclude with synthetic experiments that illustrate our findings and discuss the effect of depth on flat solutions.
翻译:经验证据表明,对于多种过参数化非线性模型(尤其在神经网络训练中),极小值点附近损失函数的增长方式显著影响其性能。平极小值——即损失函数增长缓慢的极小值区域——往往展现出更好的泛化能力。本文聚焦于过参数化非线性模型中最简单的一类——低秩矩阵恢复问题,以期深入理解这一现象。我们分析了过参数化矩阵与双线性感知、鲁棒主成分分析、协方差矩阵估计以及使用二次激活函数的单隐层神经网络。在所有情形下,我们证明以Hessian矩阵迹度量的平极小值在标准统计假设下能精确恢复真实参数。对于矩阵补全问题,我们建立了弱恢复性质,尽管经验证据表明此时精确恢复同样成立。最后,我们通过合成实验验证理论发现,并讨论网络深度对平极小值解的影响。