A scaled conjugate gradient method that accelerates existing adaptive methods utilizing stochastic gradients is proposed for solving nonconvex optimization problems with deep neural networks. It is shown theoretically that, whether with constant or diminishing learning rates, the proposed method can obtain a stationary point of the problem. Additionally, its rate of convergence with diminishing learning rates is verified to be superior to that of the conjugate gradient method. The proposed method is shown to minimize training loss functions faster than the existing adaptive methods in practical applications of image and text classification. Furthermore, in the training of generative adversarial networks, one version of the proposed method achieved the lowest Frechet inception distance score among those of the adaptive methods.
翻译:本文提出了一种缩放共轭梯度法,用于加速利用随机梯度的现有自适应方法,以解决深度神经网络中的非凸优化问题。理论分析表明,无论采用恒定学习率还是递减学习率,所提方法均能获得问题的驻点。此外,验证了其在递减学习率下的收敛速度优于共轭梯度法。在图像和文本分类的实际应用中,所提方法被证明能比现有自适应方法更快地最小化训练损失函数。此外,在生成对抗网络的训练中,所提方法的一个变体在自适应方法中取得了最低的Fréchet起始距离分数。