This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.
翻译:本文研究双齐次神经网络在小初始化(所有权重均在原点附近初始化)下的梯度流动力学。对于平方损失与逻辑损失,研究表明:在充分小的初始化条件下,梯度流动力学会在原点邻域内停留足够长时间,使得神经网络权重的方向近似收敛于一个神经相关函数的Karush-Kuhn-Tucker(KKT)点;该函数用于量化神经网络输出与训练数据集中对应标签之间的相关性。对于平方损失,已有观测表明当神经网络在原点附近初始化时,会经历鞍点到鞍点的动力学过程。受此启发,本文进一步证明了在某些鞍点邻域内,小模长权重之间也存在类似的方向收敛特性。