Recent research indicates that frequent model communication stands as a major bottleneck to the efficiency of decentralized machine learning (ML), particularly for large-scale and over-parameterized neural networks (NNs). In this paper, we introduce MALCOM-PSGD, a new decentralized ML algorithm that strategically integrates gradient compression techniques with model sparsification. MALCOM-PSGD leverages proximal stochastic gradient descent to handle the non-smoothness resulting from the $\ell_1$ regularization in model sparsification. Furthermore, we adapt vector source coding and dithering-based quantization for compressed gradient communication of sparsified models. Our analysis shows that decentralized proximal stochastic gradient descent with compressed communication has a convergence rate of $\mathcal{O}\left(\ln(t)/\sqrt{t}\right)$ assuming a diminishing learning rate and where $t$ denotes the number of iterations. Numerical results verify our theoretical findings and demonstrate that our method reduces communication costs by approximately $75\%$ when compared to the state-of-the-art method.
翻译:近期研究表明,频繁的模型通信是去中心化机器学习(ML)效率的主要瓶颈,特别是在大规模和过参数化神经网络(NN)中。本文提出MALCOM-PSGD,一种新的去中心化ML算法,该算法策略性地将梯度压缩技术与模型稀疏化相结合。MALCOM-PSGD利用近端随机梯度下降来处理因模型稀疏化中$\ell_1$正则化导致的非光滑性。此外,我们采用向量信源编码和基于抖动的量化来实现稀疏化模型的压缩梯度通信。我们的分析表明,在递减学习率假设下,采用压缩通信的去中心化近端随机梯度下降具有$\mathcal{O}\left(\ln(t)/\sqrt{t}\right)$的收敛速率,其中$t$表示迭代次数。数值结果验证了我们的理论发现,并证明与最先进方法相比,我们的方法可降低约$75\%$的通信成本。