Communication bottlenecks and the presence of stragglers pose significant challenges in distributed learning (DL). To deal with these challenges, recent advances leverage unbiased compression functions and gradient coding. However, the significant benefits of biased compression remain largely unexplored. To close this gap, we propose Compressed Gradient Coding with Error Feedback (COCO-EF), a novel DL method that combines gradient coding with biased compression to mitigate straggler effects and reduce communication costs. In each iteration, non-straggler devices encode local gradients from redundantly allocated training data, incorporate prior compression errors, and compress the results using biased compression functions before transmission. The server aggregates these compressed messages from the non-stragglers to approximate the global gradient for model updates. We provide rigorous theoretical convergence guarantees for COCO-EF and validate its superior learning performance over baseline methods through empirical evaluations. As far as we know, we are among the first to rigorously demonstrate that biased compression has substantial benefits in DL, when gradient coding is employed to cope with stragglers.
翻译:通信瓶颈与掉队者(straggler)的存在对分布式学习(DL)构成了重大挑战。为应对这些挑战,近期研究利用无偏压缩函数与梯度编码技术。然而,偏置压缩的显著优势在很大程度上仍未得到充分探索。为填补这一空白,我们提出了带误差反馈的压缩梯度编码(COCO-EF),这是一种将梯度编码与偏置压缩相结合的新型分布式学习方法,旨在缓解掉队者效应并降低通信开销。在每次迭代中,非掉队设备根据冗余分配的训练数据编码本地梯度,融入先前的压缩误差,并使用偏置压缩函数对结果进行压缩后再传输。服务器聚合来自非掉队者的这些压缩消息,以近似全局梯度进行模型更新。我们为COCO-EF提供了严格的理论收敛性保证,并通过实证评估验证了其相对于基线方法的优越学习性能。据我们所知,我们是首批严格证明当采用梯度编码应对掉队者时,偏置压缩在分布式学习中具有显著优势的研究之一。