Distributed machine learning (DML) in mobile environments faces significant communication bottlenecks. Gradient compression has emerged as an effective solution to this issue, offering substantial benefits in environments with limited bandwidth and metered data. Yet, they encounter severe performance drop in non-IID environments due to a one-size-fits-all compression approach, which does not account for the varying data volumes across workers. Assigning varying compression ratios to workers with distinct data distributions and volumes is thus a promising solution. This study introduces an analysis of distributed SGD with non-uniform compression, which reveals that the convergence rate (indicative of the iterations needed to achieve a certain accuracy) is influenced by compression ratios applied to workers with differing volumes. Accordingly, we frame relative compression ratio assignment as an $n$-variables chi-square nonlinear optimization problem, constrained by a fixed and limited communication budget. We propose DAGC-R, which assigns the worker handling larger data volumes the conservative compression. Recognizing the computational limitations of mobile devices, we DAGC-A, which are computationally less demanding and enhances the robustness of the absolute gradient compressor in non-IID scenarios. Our experiments confirm that both the DAGC-A and DAGC-R can achieve better performance when dealing with highly imbalanced data volume distribution and restricted communication.
翻译:移动环境中的分布式机器学习面临显著的通信瓶颈。梯度压缩已成为解决该问题的有效方案,能在带宽受限且按流量计费的环境中发挥重要作用。然而,在非独立同分布环境下,由于采用"一刀切"的压缩策略(未考虑各工作节点数据量的差异),此类方法会导致严重的性能下降。因此,为具有不同数据分布和数据量的工作节点分配差异化的压缩率成为极具前景的解决方案。本研究对采用非均匀压缩的分布式随机梯度下降法展开分析,发现收敛速率(即达到特定精度所需迭代次数的指标)受应用于不同数据量工作节点的压缩率影响。据此,我们将相对压缩率分配问题建模为$n$变量卡方非线性优化问题,并受限于固定且有限的通信预算。我们提出DAGC-R算法,为处理更大数据量的工作节点分配保守压缩策略。鉴于移动设备的计算资源限制,我们进一步提出DAGC-A算法,该算法计算开销更低,并增强了非独立同分布场景下绝对值梯度压缩器的鲁棒性。实验证明,在处理高度不平衡的数据量分布与受限通信条件时,DAGC-A和DAGC-R均能实现更优性能。