The training efficiency of complex deep learning models can be significantly improved through the use of distributed optimization. However, this process is often hindered by a large amount of communication cost between workers and a parameter server during iterations. To address this bottleneck, in this paper, we present a new communication-efficient algorithm that offers the synergistic benefits of both sparsification and sign quantization, called ${\sf S}^3$GD-MV. The workers in ${\sf S}^3$GD-MV select the top-$K$ magnitude components of their local gradient vector and only send the signs of these components to the server. The server then aggregates the signs and returns the results via a majority vote rule. Our analysis shows that, under certain mild conditions, ${\sf S}^3$GD-MV can converge at the same rate as signSGD while significantly reducing communication costs, if the sparsification parameter $K$ is properly chosen based on the number of workers and the size of the deep learning model. Experimental results using both independent and identically distributed (IID) and non-IID datasets demonstrate that the ${\sf S}^3$GD-MV attains higher accuracy than signSGD, significantly reducing communication costs. These findings highlight the potential of ${\sf S}^3$GD-MV as a promising solution for communication-efficient distributed optimization in deep learning.
翻译:复杂深度学习模型的训练效率可通过分布式优化显著提升。然而,该过程常因工作节点与参数服务器在迭代过程中的大量通信开销而受阻。针对这一瓶颈,本文提出一种新型通信高效算法——${\sf S}^3$GD-MV,该算法融合了稀疏化与符号量化的协同优势。${\sf S}^3$GD-MV中的工作节点选取其局部梯度向量中幅值最大的前$K$个分量,并仅向服务器发送这些分量的符号。服务器随后通过多数投票规则聚合这些符号并返回结果。理论分析表明,在特定温和条件下,若根据工作节点数量与深度学习模型规模合理选取稀疏化参数$K$,${\sf S}^3$GD-MV可保持与signSGD相同的收敛速率,同时显著降低通信成本。基于独立同分布(IID)与非独立同分布(non-IID)数据集的实验结果表明,${\sf S}^3$GD-MV在显著降低通信成本的同时获得了比signSGD更高的准确率。这些发现凸显了${\sf S}^3$GD-MV作为深度学习通信高效分布式优化方案的应用潜力。