DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines

Multi-task model training has been adopted to enable a single deep neural network model (often a large language model) to handle multiple tasks (e.g., question answering and text summarization). Multi-task training commonly receives input sequences of highly different lengths due to the diverse contexts of different tasks. Padding (to the same sequence length) or packing (short examples into long sequences of the same length) is usually adopted to prepare input samples for model training, which is nonetheless not space or computation efficient. This paper proposes a dynamic micro-batching approach to tackle sequence length variation and enable efficient multi-task model training. We advocate pipeline-parallel training of the large model with variable-length micro-batches, each of which potentially comprises a different number of samples. We optimize micro-batch construction using a dynamic programming-based approach, and handle micro-batch execution time variation through dynamic pipeline and communication scheduling, enabling highly efficient pipeline training. Extensive evaluation on the FLANv2 dataset demonstrates up to 4.39x higher training throughput when training T5, and 3.25x when training GPT, as compared with packing-based baselines. DynaPipe's source code is publicly available at https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines.

翻译：多任务模型训练已被广泛采用，使单个深度神经网络模型（通常是大语言模型）能够处理多个任务（例如问答和文本摘要）。由于不同任务的上下文差异，多任务训练通常接收长度差异极大的输入序列。传统上，采用填充（将序列统一填充至相同长度）或打包（将短样本组合成长序列以达到统一长度）的方法来准备训练样本，但这在空间或计算效率上并不高效。本文提出了一种动态微批次方法，以应对序列长度变化问题，实现高效的多任务模型训练。我们倡导采用变长微批次对大型模型进行流水线并行训练，每个微批次可能包含不同数量的样本。我们利用基于动态规划的方法优化微批次的构建，并通过动态流水线和通信调度处理微批次执行时间的变化，从而实现高效的流水线训练。在FLANv2数据集上的大量评估表明，与基于打包的基线相比，训练T5时的吞吐量提升高达4.39倍，训练GPT时提升高达3.25倍。DynaPipe的源代码已在https://github.com/awslabs/optimizing-multitask-training-through-dynamic-pipelines 公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日