We show that language model finetuning can be improved, sometimes dramatically, with a simple augmentation. NEFTune adds noise to the embedding vectors during training. Standard finetuning of LLaMA-2-7B using Alpaca achieves 29.79% on AlpacaEval, which rises to 64.69% using noisy embeddings. NEFTune also improves over strong baselines on modern instruction datasets. Models trained with Evol-Instruct see a 10% improvement, with ShareGPT an 8% improvement, and with OpenPlatypus an 8% improvement. Even powerful models further refined with RLHF such as LLaMA-2-Chat benefit from additional training with NEFTune.
翻译:我们研究表明,语言模型微调可通过一项简单增强技术获得显著改善(有时效果惊人)。NEFTune在训练过程中向嵌入向量添加噪声。使用Alpaca对LLaMA-2-7B进行标准微调时,在AlpacaEval上获得29.79%的准确率,而采用噪声嵌入后该指标提升至64.69%。NEFTune在现代指令数据集上也优于强基线模型。经Evol-Instruct训练的模型获得10%的性能提升,ShareGPT提升8%,OpenPlatypus提升8%。即便是经过RLHF进一步优化的强大模型(如LLaMA-2-Chat),也能从NEFTune的额外训练中受益。