MAP: Memory-aware Automated Intra-op Parallel Training For Foundation Models

Recently, large models have achieved the state of the art performances in various fields. In order to support large model training, we have to use distributed training techniques. However, finding an efficient distributed execution plan not only requires fine-grained model statistics, such as memory and computing overhead of each operator but also is a labor-intensive task even for an expert in the field of distributed training. In this paper, we introduce MAP, a compiler built upon PyTorch to implement Memory-aware Automated Parallelization. To profiling operator costs, existing training systems and machine learning pipelines either physically execute with respect to each operand or estimate the memory usage with a scaled input tensor, which are often time-consuming and misleading. Compared with existing methods, MAP provides an easy-to-use symbolic profiler to generate memory and computing statistics of an arbitrary PyTorch model with trivial time cost, so it will boost high productivity for ML developers. In addition, MAP can also seamlessly speed up different static planning tasks on computation graphs for PyTorch, and requires only a few lines of modification to user code to generate a new module instance that has a top-performing distributed execution plan. The source code is publicly available at https://github.com/hpcaitech/ColossalAI

翻译：近期，大模型已在多个领域实现了最先进的性能。为支持大模型训练，我们必须采用分布式训练技术。然而，即使对分布式训练领域的专家而言，寻找高效的分布式执行计划不仅需要细粒度的模型统计信息（如每个算子的内存和计算开销），也是一项劳动密集型任务。本文提出MAP——一个基于PyTorch构建的编译器，用于实现内存感知自动化并行化。在分析算子开销时，现有训练系统和机器学习流水线要么对每个操作数进行物理执行，要么通过缩放输入张量估算内存使用量，这些方法往往耗时且具有误导性。与现有方法相比，MAP提供易用的符号化分析工具，能以极低的时间成本生成任意PyTorch模型的内存与计算统计信息，从而显著提升机器学习开发者的生产力。此外，MAP还能无缝加速PyTorch计算图上的各类静态规划任务，仅需修改用户代码数行即可生成具备顶级分布式执行计划的新模块实例。源代码已发布于https://github.com/hpcaitech/ColossalAI

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日