MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate.

翻译：梯度稀疏化是一种用于扩展和加速分布式深度神经网络（DNN）训练的通信优化技术。它减少了梯度聚合过程中不断增长的通信流量。然而，现有稀疏化方法由于梯度选择的高计算成本和/或通信流量的增加，可扩展性较差。特别是，梯度累积和梯度选择阈值不当会导致通信流量增加。为应对这些挑战，我们提出了一种名为MiCRO的新型梯度稀疏化方法。在MiCRO中，梯度向量被划分，每个分区分配给对应的工作节点。随后，每个工作节点从其分区中选择梯度，聚合后的梯度不存在累积问题。此外，MiCRO通过最小化压缩比误差，估算精确阈值以按用户需求维持通信流量。MiCRO通过解决阻碍分布式DNN训练可扩展性和加速的现有问题，实现了近乎零成本的梯度稀疏化。在广泛的实验中，MiCRO以出色的收敛速度超越了最先进的稀疏化方法。

相关内容

MICRO

关注 1

MICRO：IEEE/ACM International Symposium on Microarchitecture Explanation：IEEE/ACM微体系结构国际研讨会。 Publisher：IEEE/ACM。 SIT:https://dblp.uni-trier.de/db/conf/micro/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日