Training Large Language Models (LLMs) efficiently at scale presents a formidable challenge, driven by their ever-increasing computational demands and the need for enhanced performance. In this work, we introduce Liger-Kernel, an open-sourced set of Triton kernels developed specifically for LLM training. With kernel optimization techniques like kernel operation fusing and input chunking, our kernels achieve on average a 20% increase in training throughput and a 60% reduction in GPU memory usage for popular LLMs compared to HuggingFace implementations. In addition, Liger-Kernel is designed with modularity, accessibility, and adaptability in mind, catering to both casual and expert users. Comprehensive benchmarks and integration tests are built in to ensure compatibility, performance, correctness, and convergence across diverse computing environments and model architectures. The source code is available under a permissive license at: github.com/linkedin/Liger-Kernel.
翻译:大规模高效训练大语言模型(LLMs)面临严峻挑战,这主要源于其持续增长的计算需求与对性能提升的要求。本研究提出Liger-Kernel——一套专为LLM训练开发的开源Triton内核集合。通过内核操作融合与输入分块等内核优化技术,相较于HuggingFace实现方案,我们的内核在主流LLM上平均实现训练吞吐量提升20%,GPU内存使用量降低60%。此外,Liger-Kernel在设计上注重模块化、易用性与可扩展性,兼顾普通用户与专家用户的需求。我们构建了完整的基准测试与集成测试体系,以确保在不同计算环境与模型架构下的兼容性、性能表现、正确性与收敛性。源代码已在宽松许可协议下发布于:github.com/linkedin/Liger-Kernel。