In this paper, we concentrate on decentralized optimization problems with nonconvex and nonsmooth objective functions, especially on the decentralized training of nonsmooth neural networks. We introduce a unified framework, named DSM, to analyze the global convergence of decentralized stochastic subgradient methods. We prove the global convergence of our proposed framework under mild conditions, by establishing that the generated sequence asymptotically approximates the trajectories of its associated differential inclusion. Furthermore, we establish that our proposed framework encompasses a wide range of existing efficient decentralized subgradient methods, including decentralized stochastic subgradient descent (DSGD), DSGD with gradient-tracking technique (DSGD-T), and DSGD with momentum (DSGDm). In addition, we introduce SignSGD employing the sign map to regularize the update directions in DSGDm, and show it is enclosed in our proposed framework. Consequently, our convergence results establish, for the first time, global convergence of these methods when applied to nonsmooth nonconvex objectives. Preliminary numerical experiments demonstrate that our proposed framework yields highly efficient decentralized subgradient methods with convergence guarantees in the training of nonsmooth neural networks.
翻译:本文针对非凸且非光滑目标函数的去中心化优化问题,特别是非光滑神经网络的去中心化训练,提出一个统一框架DSM,用于分析去中心化随机次梯度方法的全局收敛性。通过证明生成的序列渐近逼近其关联微分包含的轨迹,我们在温和条件下验证了所提框架的全局收敛性。此外,我们论证了该框架涵盖多种现有高效去中心化次梯度方法,包括去中心化随机次梯度下降(DSGD)、带梯度追踪技术的DSGD(DSGD-T)以及带动量的DSGD(DSGDm)。进一步地,我们引入利用符号函数正则化DSGDm更新方向的SignSGD算法,并证明其包含于所提框架中。因此,我们的收敛性结果首次建立了这些方法在非光滑非凸目标函数下的全局收敛性。初步数值实验表明,所提框架能够在非光滑神经网络训练中生成具有收敛保证的高效去中心化次梯度方法。