This paper studies the gradient flow dynamics that arise when training deep homogeneous neural networks assumed to have locally Lipschitz gradients and an order of homogeneity strictly greater than two. It is shown here that for sufficiently small initializations, during the early stages of training, the weights of the neural network remain small in (Euclidean) norm and approximately converge in direction to the Karush-Kuhn-Tucker (KKT) points of the recently introduced neural correlation function. Additionally, this paper also studies the KKT points of the neural correlation function for feed-forward networks with (Leaky) ReLU and polynomial (Leaky) ReLU activations, deriving necessary and sufficient conditions for rank-one KKT points.
翻译:本文研究训练深度同质神经网络时产生的梯度流动力学,其中假设网络具有局部利普希茨梯度且齐次阶严格大于二。研究证明,在足够小的初始化条件下,训练早期阶段神经网络的权重在(欧几里得)范数上保持较小,并近似沿方向收敛至最近提出的神经相关函数的Karush-Kuhn-Tucker(KKT)点。此外,本文还研究了前馈网络在(Leaky)ReLU及多项式(Leaky)ReLU激活函数下神经相关函数的KKT点特性,推导了秩一KKT点的充分必要条件。