Deep neural networks (DNNs) are the de-facto standard for essential use cases, such as image classification, computer vision, and natural language processing. As DNNs and datasets get larger, they require distributed training on increasingly larger clusters. A main bottleneck is then the resulting communication overhead where workers exchange model updates (i.e., gradients) on a per-round basis. To address this bottleneck and accelerate training, a widely-deployed approach is compression. However, previous deployments often apply bi-directional compression schemes by simply using a uni-directional gradient compression scheme in each direction. This results in significant computational overheads at the parameter server and increased compression error, leading to longer training and lower accuracy. We introduce Tensor Homomorphic Compression (THC), a novel bi-directional compression framework that enables the direct aggregation of compressed values while optimizing the bandwidth to accuracy tradeoff, thus eliminating the aforementioned overheads. Moreover, THC is compatible with in-network aggregation (INA), which allows for further acceleration. Evaluation over a testbed shows that THC improves time-to-accuracy in comparison to alternatives by up to 1.32x with a software PS and up to 1.51x using INA. Finally, we demonstrate that THC is scalable and tolerant for acceptable packet-loss rates.
翻译:深度神经网络(DNN)是图像分类、计算机视觉和自然语言处理等关键应用的事实标准。随着DNN和数据集规模的增大,它们需要在日益庞大的集群上进行分布式训练。此时,一个主要瓶颈在于每轮次中工作节点交换模型更新(即梯度)所产生的通信开销。为解决这一瓶颈并加速训练,压缩成为广泛采用的方法。然而,以往的部署通常仅在各方向上简单应用单向梯度压缩方案来实现双向压缩。这导致参数服务器产生显著的计算开销,并增加压缩误差,进而延长训练时间并降低准确率。我们提出张量同态压缩(THC),一种新型双向压缩框架,能够在优化带宽与精度权衡的同时直接聚合压缩值,从而消除上述开销。此外,THC兼容网内聚合(INA),可进一步加速训练。在测试平台上的评估表明,与替代方案相比,THC在使用软件参数服务器时可将时间-精度提升高达1.32倍,使用INA时提升高达1.51倍。最后,我们证明THC具有可扩展性并能容忍可接受的丢包率。