We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of $s$ other nodes (in a system of $n$ nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth non-convex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in $O(n^3/s^2)$ which outperforms the best-known bound $O(n^3)$ by a factor of $s^2$, indicating the benefit of randomized communication for DL. We empirically evaluate EL in a 96-node network and compare its performance with state-of-the-art DL approaches. Our results illustrate that EL converges up to $ 1.7\times$ quicker than baseline DL algorithms and attains $2.2 $\% higher accuracy for the same communication volume.
翻译:我们提出流行病学习(EL),一种简单而强大的去中心化学习(DL)算法,通过利用动态变化的通信拓扑结构,相比传统DL方法实现更快的模型收敛。在EL的每一轮中,每个节点将其模型更新发送给随机选取的$s$个其他节点(在包含$n$个节点的系统中)。我们提供了EL的广泛理论分析,证明其动态拓扑相较于最先进的(静态与动态)拓扑具有更优的收敛特性。考虑光滑非凸损失函数时,EL的瞬态迭代次数(即实现渐近线性加速所需的轮数)为$O(n^3/s^2)$,相比已知最优界$O(n^3)$提升了$s^2$倍,这揭示了随机通信对DL的益处。我们在96节点网络上对EL进行了实证评估,并将其性能与最先进的DL方法进行比较。结果表明,EL相比基准DL算法收敛速度提升达1.7倍,且同等通信量下准确率提高2.2%。