Deep neural networks (DNNs) are the de facto standard for essential use cases, such as image classification, computer vision, and natural language processing. As DNNs and datasets get larger, they require distributed training on increasingly larger clusters. A main bottleneck is the resulting communication overhead where workers exchange model updates (i.e., gradients) on a per-round basis. To address this bottleneck and accelerate training, a widely-deployed approach is compression. However, previous deployments often apply bi-directional compression schemes by simply using a uni-directional gradient compression scheme in each direction. This results in significant computational overheads at the parameter server and increased compression error, leading to longer training and lower accuracy. We introduce Tensor Homomorphic Compression (THC), a novel bi-directional compression framework that enables the direct aggregation of compressed values and thus eliminating the aforementioned computational overheads. Moreover, THC is compatible with in-network aggregation (INA), which allows for further acceleration. Our evaluation shows that training representative vision and language models with THC reaches target accuracy by 1.40x to 1.47x faster using INA and 1.28x to 1.33x faster using a software PS compared with state-of-the-art systems.
翻译:深度神经网络(DNN)已成为图像分类、计算机视觉和自然语言处理等关键应用领域的事实标准。随着DNN与数据集规模持续增长,它们需要分布在日益庞大的集群上进行训练。其中核心瓶颈在于通信开销——工作节点需逐轮交换模型更新(即梯度)。为突破该瓶颈并加速训练,压缩技术已被广泛采用。然而现有部署通常采用双向压缩方案,但实际应用中仅在每个传输方向简单应用单向梯度压缩,导致参数服务器端产生显著计算开销并增加压缩误差,最终延长训练时间并降低模型精度。本文提出张量同态压缩(THC)——一种新型双向压缩框架,该框架支持压缩值的直接聚合,从而消除上述计算开销。此外,THC与网络内聚合(INA)兼容,可实现进一步加速。实验评估表明,与现有最优系统相比,采用THC的代表性视觉与语言模型训练:通过INA达到目标精度可实现1.40倍至1.47倍加速,通过软件参数服务器(PS)可实现1.28倍至1.33倍加速。