Decentralized training enables learning with distributed datasets generated at different locations without relying on a central server. In realistic scenarios, the data distribution across these sparsely connected learning agents can be significantly heterogeneous, leading to local model over-fitting and poor global model generalization. Another challenge is the high communication cost of training models in such a peer-to-peer fashion without any central coordination. In this paper, we jointly tackle these two-fold practical challenges by proposing SADDLe, a set of sharpness-aware decentralized deep learning algorithms. SADDLe leverages Sharpness-Aware Minimization (SAM) to seek a flatter loss landscape during training, resulting in better model generalization as well as enhanced robustness to communication compression. We present two versions of our approach and conduct extensive experiments to show that SADDLe leads to 1-20% improvement in test accuracy compared to other existing techniques. Additionally, our proposed approach is robust to communication compression, with an average drop of only 1% in the presence of up to 4x compression.
翻译:去中心化训练使得无需依赖中央服务器即可利用不同地点生成的分布式数据集进行学习。在实际场景中,这些稀疏连接的学习节点之间的数据分布可能具有显著的异构性,导致局部模型过拟合和全局模型泛化能力差。另一个挑战是在这种无中央协调的对等训练方式中,模型训练存在高昂的通信开销。本文通过提出SADDLe——一套锐度感知去中心化深度学习算法,共同应对这两方面的实际挑战。SADDLe利用锐度感知最小化(SAM)在训练过程中寻求更平坦的损失曲面,从而获得更好的模型泛化能力以及对通信压缩的增强鲁棒性。我们提出了该方法的两个版本,并进行了大量实验,结果表明相较于其他现有技术,SADDLe在测试准确率上带来了1-20%的提升。此外,我们提出的方法对通信压缩具有鲁棒性,在高达4倍的压缩率下,平均性能下降仅为1%。