This work introduces EE-Tuning, a lightweight and economical solution to training/tuning early-exit large language models (LLMs). In contrast to the common approach of full-parameter pre-training, EE-Tuning augments any pre-trained (and possibly fine-tuned) standard LLM with additional early-exit layers that are tuned in a parameter-efficient manner, which requires significantly less computational resources and training data. Our implementation of EE-Tuning achieves outstanding training efficiency via extensive performance optimizations, as well as scalability due to its full compatibility with 3D parallelism. Results of systematic experiments validate the efficacy of EE-Tuning, confirming that effective early-exit LLM inference can be achieved with a limited training budget. In hope of making early-exit LLMs accessible to the community, we release the source code of our implementation of EE-Tuning at https://github.com/pan-x-c/EE-LLM.
翻译:本文提出EE-Tuning,一种轻量级且经济的早退大型语言模型(LLM)训练/调优方案。与常见的全参数预训练方法不同,EE-Tuning通过为任意预训练(及可能微调过的)标准LLM新增早退层,并以参数高效方式进行调优,从而显著降低计算资源与训练数据需求。我们的EE-Tuning实现通过广泛性能优化实现了卓越的训练效率,并因其与3D并行完全兼容而具备可扩展性。系统性实验的结果验证了EE-Tuning的有效性,证实可在有限训练预算下实现高效的早退LLM推理。为使早退LLM能被社区广泛使用,我们在https://github.com/pan-x-c/EE-LLM 开源了EE-Tuning的实现代码。