We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources. LongQLoRA combines the advantages of Position Interpolation, QLoRA and Shift Short Attention of LongLoRA. With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets, our model outperforms LongLoRA and is very close to MPT-7B-8K within the evaluation context length of 8192. We collect and build 39k long instruction data to extend context length of Vicuna-13B from 4096 to 8192 and achieve good performance both in long and short context generation task. We also do some ablation experiments to study the effect of LoRA rank, finetuning steps and attention patterns in inference.The model weights, training data and code are avaliable at https://github.com/yangjianxin1/LongQLoRA.
翻译:我们提出LongQLoRA,一种以更少训练资源高效扩展大语言模型上下文长度的方法。LongQLoRA融合了位置插值、QLoRA和LongLoRA的移位短注意力机制的优势。使用单块32GB V100 GPU,LongQLoRA可以在1000步微调内将LLaMA2 7B和13B的上下文长度从4096扩展至8192,甚至达到12k。在PG19和Proof-pile数据集上,LongQLoRA实现了具有竞争力的困惑度性能,在评估上下文长度8192范围内,我们的模型优于LongLoRA,且非常接近MPT-7B-8K的表现。我们收集并构建了39k条长指令数据,将Vicuna-13B的上下文长度从4096扩展至8192,并在长短上下文生成任务中均取得了良好性能。我们还通过消融实验研究了LoRA秩、微调步数以及推理中注意力模式的影响。模型权重、训练数据和代码已在https://github.com/yangjianxin1/LongQLoRA 开源。