Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset.
翻译:训练深度神经网络是一项具有挑战性的任务。为加速训练并提升深度神经网络的性能,我们对原始共轭梯度法进行修正,提出共轭梯度类方法,并将其融入通用Adam算法,进而设计了一种新的深度学习优化算法——CG-like-Adam。具体而言,通用Adam中的一阶矩估计和二阶矩估计均被替换为共轭梯度类方法。收敛性分析涵盖了一阶矩估计的指数移动平均系数为常数以及一阶矩估计无偏的情形。基于CIFAR10/100数据集的数值实验表明,所提算法具有优越性。