We consider the optimization problem associated with training two-layer ReLU networks with \(d\) inputs under the squared loss, where the labels are generated by a target network. Recent work has identified two distinct classes of infinite families of minima: one whose training loss vanishes in the high-dimensional limit, and another whose loss remains bounded away from zero. The latter family is empirically avoided by stochastic gradient descent, hence \emph{hidden}, motivating the search for analytic criteria that distinguish hidden from non-hidden minima. A key challenge is that prior analyses have shown the Hessian spectra at hidden and non-hidden minima to coincide up to terms of order \(O(d^{-1/2})\), seemingly limiting the discriminative power of spectral methods. We therefore take a different route, studying instead certain curves along which the loss is locally minimized. Our main result shows that arcs emanating from hidden minima exhibit distinctive structural and symmetry properties, arising precisely from \(Ω(d^{-1/2})\) eigenvalue contributions that are absent from earlier analyses.
翻译:我们研究在平方损失下训练具有\(d\)个输入的双层ReLU网络的优化问题,其中标签由目标网络生成。近期研究识别出两类不同的无限极小值族:一类在高维极限下训练损失趋于零,另一类损失则保持远离零的界限。后者在随机梯度下降中被经验性地避开,因此称为\emph{隐藏}极小值,这促使我们寻找区分隐藏与非隐藏极小值的解析判据。关键挑战在于先前分析表明,隐藏与非隐藏极小值处的Hessian谱在\(O(d^{-1/2})\)阶项内重合,这似乎限制了谱方法的判别能力。因此我们采用不同路径,转而研究损失局部极小化的特定曲线。我们的主要结果表明,从隐藏极小值出发的弧段展现出独特的结构和对称性特征,这些特征恰恰源于早期分析中缺失的\(Ω(d^{-1/2})\)阶特征值贡献。