We present a generalization of Nesterov's accelerated gradient descent algorithm. Our algorithm (AGNES) provably achieves acceleration for smooth convex minimization tasks with noisy gradient estimates if the noise intensity is proportional to the magnitude of the gradient. Nesterov's accelerated gradient descent does not converge under this noise model if the constant of proportionality exceeds one. AGNES fixes this deficiency and provably achieves an accelerated convergence rate no matter how small the signal to noise ratio in the gradient estimate. Empirically, we demonstrate that this is an appropriate model for mini-batch gradients in overparameterized deep learning. Finally, we show that AGNES outperforms stochastic gradient descent with momentum and Nesterov's method in the training of CNNs.
翻译:我们提出了内斯特罗夫加速梯度下降算法的一个推广。我们的算法(AGNES)在梯度估计噪声强度与梯度幅度成比例的条件下,可证明地实现了光滑凸最小化任务的加速。若比例常数超过1,内斯特罗夫加速梯度下降在该噪声模型下无法收敛。AGNES修复了这一缺陷,并无论梯度估计中信噪比多小,都能证明地达到加速收敛速率。实验表明,这是过参数化深度学习中微批梯度的恰当模型。最后,我们证明AGNES在训练CNN时优于带动量的随机梯度下降法和内斯特罗夫方法。