Vanilla gradient methods are often highly sensitive to the choice of stepsize, which typically requires manual tuning. Adaptive methods alleviate this issue and have therefore become widely used. Among them, AdaGrad has been particularly influential. In this paper, we propose an AdaGrad-style adaptive method in which the adaptation is driven by the cumulative squared norms of successive gradient differences rather than gradient norms themselves. The key idea is that when gradients vary little across iterations, the stepsize is not unnecessarily reduced, while significant gradient fluctuations, reflecting curvature or instability, lead to automatic stepsize damping. Numerical experiments demonstrate that the proposed method is more robust than AdaGrad in several practically relevant settings.
翻译:传统梯度方法通常对步长选择高度敏感,通常需要手动调参。自适应方法缓解了这一问题,因此得到广泛应用。其中,AdaGrad算法尤其具有影响力。本文提出一种AdaGrad风格的自适应方法,其自适应机制由连续梯度差值的累积平方范数驱动,而非梯度范数本身。核心思想在于:当梯度在迭代间变化较小时,步长不会不必要地缩减;而当梯度显著波动(反映曲率或不稳定性)时,步长会自动衰减。数值实验表明,在若干实际相关场景中,所提方法比AdaGrad具有更强的鲁棒性。