To address the communication bottleneck challenge in distributed learning, our work introduces a novel two-stage quantization strategy designed to enhance the communication efficiency of distributed Stochastic Gradient Descent (SGD). The proposed method initially employs truncation to mitigate the impact of long-tail noise, followed by a non-uniform quantization of the post-truncation gradients based on their statistical characteristics. We provide a comprehensive convergence analysis of the quantized distributed SGD, establishing theoretical guarantees for its performance. Furthermore, by minimizing the convergence error, we derive optimal closed-form solutions for the truncation threshold and non-uniform quantization levels under given communication constraints. Both theoretical insights and extensive experimental evaluations demonstrate that our proposed algorithm outperforms existing quantization schemes, striking a superior balance between communication efficiency and convergence performance.
翻译:为应对分布式学习中的通信瓶颈挑战,本文提出了一种新颖的两阶段量化策略,旨在提升分布式随机梯度下降(SGD)的通信效率。该方法首先通过截断操作减轻长尾噪声的影响,随后基于截断后梯度的统计特征对其进行非均匀量化。我们对量化后的分布式SGD进行了全面的收敛性分析,为其性能建立了理论保证。此外,通过最小化收敛误差,我们在给定通信约束条件下推导出了截断阈值与非均匀量化等级的最优闭式解。理论分析与大量实验评估均表明,本文提出的算法优于现有量化方案,在通信效率与收敛性能之间实现了更优的平衡。