We study the population loss landscape of two-layer ReLU networks of the form $\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$ in a realisable teacher-student setting with Gaussian covariates. We show that local minima admit an exact low-dimensional representation in terms of summary statistics, yielding a sharp and interpretable characterisation of the landscape. We further establish a direct link with one-pass SGD: local minima correspond to attractive fixed points of the dynamics in summary statistics space. This perspective reveals a hierarchical structure of minima: they are typically isolated in the well-specified regime, but become connected by flat directions as network width increases. In this overparameterised regime, global minima become increasingly accessible, attracting the dynamics and reducing convergence to spurious solutions. Overall, our results reveal intrinsic limitations of common simplifying assumptions, which may miss essential features of the loss landscape even in minimal neural network models.
翻译:我们研究了在可实现教师-学生设定下,具有高斯协变量且形式为$\sum_{k=1}^K \mathrm{ReLU}(w_k^\top x)$的双层ReLU网络的总体损失景观。结果表明,局部极小值在汇总统计量方面具有精确的低维表示,从而对景观给出清晰且可解释的刻画。我们进一步建立了与单次SGD的直接联系:局部极小值对应汇总统计量空间中动力学的吸引固定点。这一视角揭示了极小值的分层结构:在良好设定状态下,它们通常是孤立的,但随着网络宽度增加,会通过平坦方向相互连接。在过参数化状态下,全局极小值变得愈发可达,吸引动力学并减少收敛至虚假解。总体而言,我们的结果揭示了常见简化假设的内在局限性,这些假设即使在最小神经网络模型中也可能遗漏损失景观的本质特征。