In this paper, we focus on the decentralized stochastic subgradient-based methods in minimizing nonsmooth nonconvex functions without Clarke regularity, especially in the decentralized training of nonsmooth neural networks. We propose a general framework that unifies various decentralized subgradient-based methods, such as decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). To establish the convergence properties of our proposed framework, we relate the discrete iterates to the trajectories of a continuous-time differential inclusion, which is assumed to have a coercive Lyapunov function with a stable set $\mathcal{A}$. We prove the asymptotic convergence of the iterates to the stable set $\mathcal{A}$ with sufficiently small and diminishing step-sizes. These results provide first convergence guarantees for some well-recognized of decentralized stochastic subgradient-based methods without Clarke regularity of the objective function. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized stochastic subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.
翻译:本文聚焦于在缺乏Clarke正则性的非光滑非凸函数最小化问题中,基于随机次梯度的去中心化方法,特别是在非光滑神经网络去中心化训练中的应用。我们提出了一个统一框架,该框架整合了多种基于去中心化次梯度的方法,例如去中心化随机次梯度下降法(DSGD)、结合梯度追踪技术的DSGD(DSGD-T)以及结合动量的DSGD(DSGD-M)。为建立所提框架的收敛性质,我们将离散迭代序列与一个连续时间微分包含的轨迹相联系,该微分包含被假定具有一个带有稳定集 $\mathcal{A}$ 的强制Lyapunov函数。我们证明了在步长充分小且递减的条件下,迭代序列渐近收敛到稳定集 $\mathcal{A}$。这些结果为一些广受认可的、目标函数无需满足Clarke正则性的去中心化随机次梯度方法提供了首个收敛性保证。初步数值实验表明,我们提出的框架在非光滑神经网络训练中能够产生具有收敛保证的高效去中心化随机次梯度方法。