Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. Gossip algorithms (both synchronous and asynchronous versions) suffer from high communication cost, while random-walk based learning experiences increased convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized algorithm building on local-SGD algorithms, which are originally designed for communication efficient centralized learning. We design both single-stream and multi-stream DIGEST, where the communication overhead may increase when the number of streams increases, and there is a convergence and communication overhead trade-off which can be leveraged. We analyze the convergence of single- and multi-stream DIGEST, and prove that both algorithms approach to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; i.e., its convergence time is better than or comparable to the baselines in iid setting, and outperforms the baselines in non-iid setting.
翻译:两种广泛研究的去中心化学习算法分别是Gossip算法和基于随机游走的算法。Gossip算法(包括同步与异步版本)存在通信开销过高的问题,而基于随机游走的学习算法则收敛时间较长。本文通过融合Gossip与随机游走的思想,结合随机梯度下降(SGD)方法,设计出快速且通信高效的异步去中心化学习机制DIGEST。DIGEST是一种基于局部SGD算法的异步去中心化算法,此类算法最初为通信高效的集中式学习而设计。我们提出了单流与多流两种DIGEST算法变体,其中通信开销随流数量增加而上升,收敛速度与通信开销之间可进行权衡。我们分析了单流与多流DIGEST的收敛性,证明在独立同分布(iid)与非独立同分布(non-iid)数据分布下,两种算法均能渐近趋近最优解。在逻辑回归与深度神经网络ResNet20上的性能评估表明:多流DIGEST具有良好收敛特性——在iid场景下其收敛时间优于或持平于基线方法,在non-iid场景下则全面超越基线方法。