In this paper, we focus on providing convergence guarantees for variants of the stochastic subgradient descent (SGD) method in minimizing nonsmooth nonconvex functions. We first develop a general framework to establish global stability for general stochastic subgradient methods, where the corresponding differential inclusion admits a coercive Lyapunov function. We prove that, with sufficiently small stepsizes and controlled noises, the iterates asymptotically stabilize around the stable set of its corresponding differential inclusion. Then we introduce a scheme for developing SGD-type methods with regularized update directions for the primal variables. Based on our developed framework, we prove the global stability of our proposed scheme under mild conditions. We further illustrate that our scheme yields variants of SGD-type methods, which enjoy guaranteed convergence in training nonsmooth neural networks. In particular, by employing the sign map to regularize the update directions, we propose a novel subgradient method named the Sign-map Regularized SGD method (SRSGD). Preliminary numerical experiments exhibit the high efficiency of SRSGD in training deep neural networks.
翻译:本文聚焦于为随机次梯度下降(SGD)方法的变体在极小化非光滑非凸函数时提供收敛性保证。我们首先建立了一个通用框架,用于确立一般随机次梯度方法的全局稳定性,其中对应的微分包含具有强制性Lyapunov函数。我们证明,在步长足够小且噪声可控的条件下,迭代序列渐近稳定于其对应微分包含的稳定集附近。接着,我们提出了一种开发SGD类方法的方案,该方案对原始变量采用正则化更新方向。基于所建立的框架,我们证明了该方案在温和条件下的全局稳定性。我们进一步说明,该方案衍生出的SGD类方法变体在训练非光滑神经网络时具有保证的收敛性。特别地,通过采用符号映射来正则化更新方向,我们提出了一种新的次梯度方法,称为符号映射正则化SGD方法(SRSGD)。初步数值实验表明,SRSGD在训练深度神经网络时具有高效率。