Under mild assumptions, we investigate the structure of loss landscape of two-layer neural networks near global minima, determine the set of parameters which give perfect generalization, and fully characterize the gradient flows around it. With novel techniques, our work uncovers some simple aspects of the complicated loss landscape and reveals how model, target function, samples and initialization affect the training dynamics differently. Based on these results, we also explain why (overparametrized) neural networks could generalize well.
翻译:在温和假设下,我们研究了两层神经网络损失函数在全局最小值附近的结构,确定了实现完美泛化的参数集合,并完整刻画了其周围的梯度流。通过创新性技术,本工作揭示了复杂损失景观中的若干简单特性,阐明了模型、目标函数、样本及初始化对训练动力学的不同影响机制。基于这些结论,我们还解释了(过参数化)神经网络为何能够具备良好的泛化能力。