Training Large Language Models (LLMs) efficiently at scale presents a formidable challenge, driven by their ever-increasing computational demands and the need for enhanced performance. In this work, we introduce Liger-Kernel, an open-sourced set of Triton kernels developed specifically for LLM training. With kernel optimization techniques like kernel operation fusing and input chunking, our kernels achieve on average a 20% increase in training throughput and a 60% reduction in GPU memory usage for popular LLMs compared to HuggingFace implementations. In addition, Liger-Kernel is designed with modularity, accessibility, and adaptability in mind, catering to both casual and expert users. Comprehensive benchmarks and integration tests are built in to ensure compatibility, performance, correctness, and convergence across diverse computing environments and model architectures. The source code is available under a permissive license at: github.com/linkedin/Liger-Kernel.
翻译:大规模高效训练大语言模型(LLMs)面临严峻挑战,这主要源于其持续增长的计算需求与性能提升要求。本研究提出Liger-Kernel——一套专为LLM训练开发的开源Triton算子库。通过算子融合与输入分块等内核优化技术,相较于HuggingFace实现方案,本算子库在主流LLM上平均实现训练吞吐量提升20%,GPU内存使用量降低60%。此外,Liger-Kernel在设计上注重模块化、易用性与可扩展性,兼顾普通用户与专家需求。我们构建了完整的基准测试与集成测试体系,确保其在多样化计算环境与模型架构下的兼容性、性能表现、正确性与收敛性。源代码已在宽松许可协议下发布于:github.com/linkedin/Liger-Kernel。