ProTrain: Efficient LLM Training via Memory-Aware Techniques

It is extremely memory-hungry to train Large Language Models (LLM). To solve this problem, existing work exploits the combination of CPU and GPU for the training process, such as ZeRO-Offload. Such a technique largely democratizes billion-scale model training, making it possible to train with few consumer graphics cards. However, based on our observation, existing frameworks often provide coarse-grained memory management and require experienced experts in configuration tuning, leading to suboptimal hardware utilization and performance. This paper proposes ProTrain, a novel training system that intelligently balances memory usage and performance by coordinating memory, computation, and IO. ProTrain achieves adaptive memory management through Chunk-Based Model State Management and Block-Wise Activation Management, guided by a Memory-Aware Runtime Profiler without user intervention. ProTrain does not change the training algorithm and thus does not compromise accuracy. Experiments show that ProTrain improves training throughput by 1.43$\times$ to 2.71$\times$ compared to the SOTA training systems.

翻译：训练大语言模型（LLM）需要消耗极高的内存。为解决此问题，现有工作利用CPU与GPU协同训练，例如ZeRO-Offload技术。此类技术大幅降低了十亿级模型训练的门槛，使得仅用少量消费级显卡即可完成训练。然而，根据我们的观察，现有框架通常提供粗粒度的内存管理，且需要经验丰富的专家进行配置调优，导致硬件利用率和性能未能达到最优。本文提出ProTrain，一种通过协调内存、计算与IO来智能平衡内存使用与性能的新型训练系统。ProTrain通过基于分块的模型状态管理（Chunk-Based Model State Management）和基于块的激活管理（Block-Wise Activation Management），在内存感知运行时分析器（Memory-Aware Runtime Profiler）的引导下实现自适应内存管理，无需用户干预。ProTrain不改变训练算法，因此不会影响模型精度。实验表明，与现有最优训练系统相比，ProTrain可将训练吞吐量提升1.43倍至2.71倍。

相关内容

大语言模型

关注 66

大语言模型是基于海量文本数据训练的深度学习模型。它不仅能够生成自然语言文本，还能够深入理解文本含义，处理各种自然语言任务，如文本摘要、问答、翻译等。2023年，大语言模型及其在人工智能领域的应用已成为全球科技研究的热点，其在规模上的增长尤为引人注目，参数量已从最初的十几亿跃升到如今的一万亿。参数量的提升使得模型能够更加精细地捕捉人类语言微妙之处，更加深入地理解人类语言的复杂性。在过去的一年里，大语言模型在吸纳新知识、分解复杂任务以及图文对齐等多方面都有显著提升。随着技术的不断成熟，它将不断拓展其应用范围，为人类提供更加智能化和个性化的服务，进一步改善人们的生活和生产方式。

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日