MemAscend：面向SSD卸载式大语言模型微调的系统内存优化 (MemAscend: System Memory Optimization for SSD-Offloaded LLM Fine-Tuning)

Owing to the huge success of generative artificial intelligence (AI), large language models (LLMs) have emerged as a core subclass, underpinning applications such as question answering, text generation, and code completion. While fine-tuning these models on domain-specific data can yield significant performance gains, it also poses daunting computational challenges, especially for researchers and small organizations with limited hardware resources. Although SSD offloading (i.e., ZeRO-Infinity) has emerged as a viable strategy to overcome the GPU memory barrier via leveraging both system memory (i.e., CPU DRAM) and storage space (i.e., solid-state devices, SSDs), its design primarily targets model-centric performance issues. As a result, key system-level issues, including system memory fragmentation, inefficient pinned buffer allocation, peak CPU usage spikes, and file system overhead, remain unaddressed, stifling scalability and inflating costs. Such an observation motivates this paper to introduce MemAscend, a framework that systematically tackles the underexplored system memory bottlenecks in SSD-offloaded LLM training, with a focus on resource-constrained environments. By streamlining pinned-memory allocation, eradicating fragmentation, and mitigating peak overhead, MemAscend reclaims a substantial system memory budget, enabling larger models, longer context windows, and higher batch sizes without exceeding modest hardware limits. Across diverse LLM benchmarks, MemAscend reduces peak system-memory consumption by an average of 55.7% compared with standard SSD offloading techniques, lowering the hardware barrier for fine-tuning and unlocking new possibilities for cost-effective large-scale training on limited-resource machines.

翻译：由于生成式人工智能的巨大成功，大语言模型已成为核心子类，支撑着问答、文本生成和代码补全等应用。虽然基于领域特定数据对这些模型进行微调可带来显著的性能提升，但也带来了巨大的计算挑战，尤其对硬件资源有限的研究者和小型机构而言。尽管SSD卸载技术（如ZeRO-Infinity）通过同时利用系统内存（即CPU DRAM）和存储空间（即固态设备）已成为突破GPU内存限制的可行策略，但其设计主要针对以模型为中心的性能问题。因此，包括系统内存碎片化、固定缓冲区分配低效、CPU使用率峰值激增及文件系统开销在内的关键系统级问题仍未解决，这抑制了可扩展性并推高了成本。这一观察促使本文提出MemAscend框架，该框架系统性地解决了SSD卸载式大语言模型训练中尚未深入探索的系统内存瓶颈，重点关注资源受限环境。通过优化固定内存分配、消除内存碎片并降低峰值开销，MemAscend回收了大量系统内存预算，使得在不超过有限硬件限制的前提下能够训练更大模型、支持更长上下文窗口并采用更高批量大小。在多样化的大语言模型基准测试中，与标准SSD卸载技术相比，MemAscend平均降低峰值系统内存消耗达55.7%，降低了微调的硬件门槛，为资源受限机器上实现经济高效的大规模训练开辟了新可能。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日