Stochastic gradient descent with momentum (SGDM) is the dominant algorithm in many optimization scenarios, including convex optimization instances and non-convex neural network training. Yet, in the stochastic setting, momentum interferes with gradient noise, often leading to specific step size and momentum choices in order to guarantee convergence, set aside acceleration. Proximal point methods, on the other hand, have gained much attention due to their numerical stability and elasticity against imperfect tuning. Their stochastic accelerated variants though have received limited attention: how momentum interacts with the stability of (stochastic) proximal point methods remains largely unstudied. To address this, we focus on the convergence and stability of the stochastic proximal point algorithm with momentum (SPPAM), and show that SPPAM allows a faster linear convergence to a neighborhood compared to the stochastic proximal point algorithm (SPPA) with a better contraction factor, under proper hyperparameter tuning. In terms of stability, we show that SPPAM depends on problem constants more favorably than SGDM, allowing a wider range of step size and momentum that lead to convergence.
翻译:带动量随机梯度下降(SGDM)是多数优化场景中的主导算法,包括凸优化实例与非凸神经网络训练。然而在随机环境下,动量会干扰梯度噪声,常需选择特定步长与动量以保证收敛,更遑论加速效果。另一方面,近端点方法因其数值稳定性及对参数调优不敏感的特性而备受关注,但其随机加速变体的研究十分有限:动量与(随机)近端点方法稳定性的相互作用尚待深入探究。为此,我们聚焦于带动量的随机近端点算法(SPPAM)的收敛性与稳定性,并证明在合理超参数调优下,相较于随机近端点算法(SPPA),SPPAM能以更优收缩因子加速线性收敛至邻近区域。在稳定性方面,SPPAM对问题常数的依赖优于SGDM,可允许更广泛的步长与动量取值范围以实现收敛。