We present LongQLoRA, an efficient and effective method to extend context length of large language models with less training resources. LongQLoRA combines the advantages of Position Interpolation, QLoRA and Shift Short Attention of LongLoRA. With a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k within 1000 finetuning steps. LongQLoRA achieves competitive perplexity performance on PG19 and Proof-pile datasets, our model outperforms LongLoRA and is very close to MPT-7B-8K within the evaluation context length of 8192. We collect and build 39k long instruction data to extend context length of Vicuna-13B from 4096 to 8192 and achieve good performance both in long and short context generation task. We also do some ablation experiments to study the effect of LoRA rank, finetuning steps and attention patterns in inference.The model weights, training data and code are avaliable at https://github.com/yangjianxin1/LongQLoRA.
翻译:我们提出LongQLoRA,一种以更少训练资源高效扩展大语言模型上下文长度的方法。LongQLoRA融合了位置插值、QLoRA和LongLoRA的移位短注意力机制的优势。借助单个32GB V100 GPU,LongQLoRA可在1000步微调内将LLaMA2 7B和13B的上下文长度从4096扩展至8192,甚至达到12k。在PG19和Proof-pile数据集上,LongQLoRA取得了具有竞争力的困惑度性能,在8192评估上下文长度内,我们的模型优于LongLoRA且非常接近MPT-7B-8K。我们收集并构建了39k条长指令数据,将Vicuna-13B的上下文长度从4096扩展至8192,在长短上下文生成任务中均表现出色。我们还进行了消融实验,研究LoRA秩、微调步数和推理中注意力模式的影响。模型权重、训练数据和代码已开源至https://github.com/yangjianxin1/LongQLoRA。