HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large model learning. This paper presents \OurSystem, an automated system designed to expedite SPMD DNN training on heterogeneous clusters. \OurSystem jointly optimizes the tensor sharding strategy, sharding ratios across heterogeneous devices and the communication methods for tensor exchanges for optimized distributed training with SPMD parallelism. We novelly formulate model partitioning as a program synthesis problem, in which we generate a distributed program from scratch on a distributed instruction set that semantically resembles the program designed for a single device, and systematically explore the solution space with an A*-based search algorithm. We derive the optimal tensor sharding ratios by formulating it as a linear programming problem. Additionally, \OurSystem explores tensor communication optimization in a heterogeneous cluster and integrates it as part of the program synthesis process, for automatically choosing optimal collective communication primitives and applying sufficient factor broadcasting technique. Extensive experiments on representative workloads demonstrate that \OurSystem achieves up to 2.41x speed-up on heterogeneous clusters.

翻译：摘要：单程序多数据（SPMD）并行最近被用于训练大规模深度神经网络（DNN）。很少有研究探讨其在异构集群上的适用性，以充分利用可用资源进行大规模模型学习。本文提出了\OurSystem，一个旨在加速异构集群上SPMD DNN训练的自动化系统。\OurSystem联合优化了张量分片策略、跨异构设备的分片比例以及张量交换的通信方法，以实现基于SPMD并行的优化分布式训练。我们创新性地将模型划分建模为一个程序合成问题，在此问题中，我们从零开始在一个分布式指令集上生成一个分布式程序，该指令集在语义上类似于为单设备设计的程序，并使用基于A*的搜索算法系统地探索解空间。通过将最优张量分片比例建模为线性规划问题，我们推导出该比例。此外，\OurSystem在异构集群中探索张量通信优化，并将其集成到程序合成过程中，以自动选择最优的集合通信原语并应用充分的因子广播技术。在代表性工作负载上的大量实验表明，\OurSystem在异构集群上实现了高达2.41倍的加速效果。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CHI2020-微软】解释可解释性:理解数据科学家使用机器学习的可解释性工具，Interpreting Interpretability: Understanding Data Scientists’Use of Interpretability Tools for Machine Learning

专知会员服务

55+阅读 · 2020年3月8日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日