In this paper, we concentrate on decentralized optimization problems with nonconvex and nonsmooth objective functions, especially on the decentralized training of nonsmooth neural networks. We introduce a unified framework to analyze the global convergence of decentralized stochastic subgradient-based methods. We prove the global convergence of our proposed framework under mild conditions, by establishing that the generated sequence asymptotically approximates the trajectories of its associated differential inclusion. Furthermore, we establish that our proposed framework covers a wide range of existing efficient decentralized subgradient-based methods, including decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGD-M). In addition, we introduce the sign map to regularize the update directions in DSGD-M, and show it is enclosed in our proposed framework. Consequently, our convergence results establish, for the first time, global convergence of these methods when applied to nonsmooth nonconvex objectives. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized subgradient-based methods with convergence guarantees in the training of nonsmooth neural networks.
翻译:本文聚焦于具有非凸与非光滑目标函数的去中心化优化问题,特别是非光滑神经网络的去中心化训练。我们引入了一个统一框架来分析基于去中心化随机次梯度方法的全局收敛性。通过证明所生成序列渐近逼近其关联微分包含的轨迹,我们在温和条件下证明了所提框架的全局收敛性。此外,我们证实所提框架涵盖了众多现有高效的基于去中心化次梯度的方法,包括去中心化随机次梯度下降法(DSGD)、采用梯度跟踪技术的DSGD(DSGD-T)以及带动量的DSGD(DSGD-M)。进一步地,我们引入符号映射来正则化DSGD-M中的更新方向,并证明其可纳入所提框架。因此,我们的收敛结果首次为这些方法应用于非光滑非凸目标函数时的全局收敛性提供了理论保证。初步数值实验表明,所提框架在非光滑神经网络训练中能够产生具有收敛保证的高效去中心化次梯度方法。