This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks, starting with small initializations. The present work considers neural networks that are assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. This paper demonstrates that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in norm and approximately converge in direction along the Karush-Kuhn-Tucker (KKT) points of the neural correlation function introduced in [1]. Additionally, for square loss and under a separability assumption on the weights of neural networks, a similar directional convergence of gradient flow dynamics is shown near certain saddle points of the loss function.
翻译:本文研究了深度同质神经网络在极小初始化条件下训练时产生的梯度流动力学。本研究考虑的神经网络具有局部Lipschitz梯度且齐次阶严格大于2。研究表明,对于足够小的初始化,在训练早期阶段,神经网络权重的范数保持较小,并沿[1]中引入的神经相关函数的Karush-Kuhn-Tucker (KKT)点方向近似收敛。此外,对于平方损失函数且假设神经网络权重可分离时,展示了损失函数某些鞍点附近梯度流动力学具有类似的方向收敛性。