This paper examines gradient flow dynamics of two-homogeneous neural networks for small initializations, where all weights are initialized near the origin. For both square and logistic losses, it is shown that for sufficiently small initializations, the gradient flow dynamics spend sufficient time in the neighborhood of the origin to allow the weights of the neural network to approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of a neural correlation function that quantifies the correlation between the output of the neural network and corresponding labels in the training data set. For square loss, it has been observed that neural networks undergo saddle-to-saddle dynamics when initialized close to the origin. Motivated by this, this paper also shows a similar directional convergence among weights of small magnitude in the neighborhood of certain saddle points.
翻译:本文研究了小初始化下二齐次神经网络的梯度流动力学,其中所有权重均初始化于原点附近。对于平方损失和对数损失,研究表明:当初始化足够小时,梯度流动力学会在原点邻域内停留足够长的时间,使得神经网络的权重在方向上近似收敛于神经相关函数(该函数量化神经网络输出与训练数据集中对应标签之间的相关性)的Karush-Kuhn-Tucker(KKT)点。就平方损失而言,已有实验观察到,当初始化接近原点时,神经网络会经历鞍点至鞍点动力学。受此启发,本文进一步证明了在特定鞍点的邻域内,幅度较小的权重之间存在类似的定向收敛性。