We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $\mathcal{O}(\frac{n^3}{s^2})$ which outperforms the best-known bound $\mathcal{O}({n^3})$ by a factor of $ s^2 $, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.6\times $ quicker than baseline DL algorithms and attains 1.8% higher accuracy for the same communication volume.
翻译:本文提出流行病学习(Epidemic Learning, EL),一种简洁而强大的去中心化学习(Decentralized Learning, DL)算法,通过利用动态变化的通信拓扑结构,实现比传统去中心化学习方法更快的模型收敛。在EL的每一轮中,每个节点将其模型更新发送给随机抽取的$s$个其他节点(在包含$n$个节点的系统中)。我们提供了EL的深入理论分析,证明其动态拓扑结构相比现有最优(静态和动态)拓扑具有更优的收敛性能。考虑光滑非凸损失函数,EL的瞬态迭代次数(即达到渐进线性加速所需的轮数)为$\mathcal{O}(\frac{n^3}{s^2})$,较已知最优界$\mathcal{O}({n^3})$提升了$s^2$倍,揭示了随机通信对去中心化学习的益处。我们在96节点网络上对EL进行实验评估,并与现有最优去中心化学习方法进行性能对比。结果表明,EL的收敛速度比基线去中心化学习算法快达1.6倍,在相同通信量下准确率提升1.8%。