Two widely considered decentralized learning algorithms are Gossip and random walk-based learning. Gossip algorithms (both synchronous and asynchronous versions) suffer from high communication cost, while random-walk based learning experiences increased convergence time. In this paper, we design a fast and communication-efficient asynchronous decentralized learning mechanism DIGEST by taking advantage of both Gossip and random-walk ideas, and focusing on stochastic gradient descent (SGD). DIGEST is an asynchronous decentralized algorithm building on local-SGD algorithms, which are originally designed for communication efficient centralized learning. We design both single-stream and multi-stream DIGEST, where the communication overhead may increase when the number of streams increases, and there is a convergence and communication overhead trade-off which can be leveraged. We analyze the convergence of single- and multi-stream DIGEST, and prove that both algorithms approach to the optimal solution asymptotically for both iid and non-iid data distributions. We evaluate the performance of single- and multi-stream DIGEST for logistic regression and a deep neural network ResNet20. The simulation results confirm that multi-stream DIGEST has nice convergence properties; i.e., its convergence time is better than or comparable to the baselines in iid setting, and outperforms the baselines in non-iid setting.
翻译:两种广泛关注的去中心化学习算法是Gossip算法和随机游走学习算法。Gossip算法(包括同步和异步版本)存在通信成本高的问题,而基于随机游走的学习则收敛时间更长。本文通过融合Gossip与随机游走的思想,以随机梯度下降为核心,设计了一种快速且通信高效的异步去中心化学习机制DIGEST。DIGEST是一种基于局部SGD算法构建的异步去中心化算法,而局部SGD算法最初是为通信高效的集中式学习设计的。我们提出了单流和多流两种DIGEST变体,其中通信开销随流数量增加而增大,且存在收敛性与通信开销之间的可权衡关系。我们分析了单流和多流DIGEST的收敛性,证明两种算法在独立同分布(iid)和非独立同分布(non-iid)数据分布下均能渐近收敛至最优解。我们通过逻辑回归与深度神经网络ResNet20评估了单流和多流DIGEST的性能。仿真结果验证了多流DIGEST具有良好的收敛特性:在iid设定下其收敛时间优于或接近基线方法,而在non-iid设定下显著优于基线方法。