Stochastic optimization algorithms implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the communication overhead for exchanging information such as stochastic gradients between different workers. Sparse communication with memory and the adaptive aggregation methodology are two successful frameworks among the various techniques proposed to address this issue. In this paper, we exploit the advantages of Sparse communication and Adaptive aggregated Stochastic Gradients to design a communication-efficient distributed algorithm named SASG. Specifically, we determine the workers who need to communicate with the parameter server based on the adaptive aggregation rule and then sparsify the transmitted information. Therefore, our algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed system. We define an auxiliary sequence and provide convergence results of the algorithm with the help of Lyapunov function analysis. Experiments on training deep neural networks show that our algorithm can significantly reduce the communication overhead compared to the previous methods, with little impact on training and testing accuracy.
翻译:在分布式计算架构上实现的随机优化算法正日益广泛地应用于大规模机器学习任务。此类分布式系统的一个关键瓶颈在于不同工作节点间交换信息(如随机梯度)所产生的通信开销。在已提出的众多解决方案中,基于记忆的稀疏通信与自适应聚合方法是两个成功的技术框架。本文融合稀疏通信与自适应聚合随机梯度的优势,设计了一种通信高效的分布式算法SASG。具体而言,我们依据自适应聚合规则确定需要与参数服务器通信的工作节点,并对传输信息进行稀疏化处理。该算法从而同时降低了分布式系统中的通信轮次开销与通信比特数。通过定义辅助序列并借助李雅普诺夫函数分析,我们给出了算法的收敛性证明。在深度神经网络训练上的实验表明,相较于现有方法,本算法能显著降低通信开销,且对训练与测试精度影响甚微。