LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter only introduces 1.2M learnable parameters upon the frozen LLaMA 7B model, and costs less than one hour for fine-tuning on 8 A100 GPUs. Specifically, we adopt a set of learnable adaption prompts, and prepend them to the word tokens at higher transformer layers. Then, a zero-initialized attention mechanism with zero gating is proposed, which adaptively injects the new instructional cues into LLaMA, while effectively preserves its pre-trained knowledge. With our efficient training, LLaMA-Adapter can generate high-quality responses, comparable to Alpaca with fully fine-tuned 7B parameters. Besides language commands, our approach can be simply extended to multi-modal instructions for learning image-conditioned LLaMA model, which achieves superior reasoning performance on ScienceQA and COCO Caption benchmarks. Furthermore, we also evaluate the zero-initialized attention mechanism for fine-tuning other pre-trained models (ViT, RoBERTa) on traditional vision and language tasks, demonstrating the superior generalization capacity of our approach. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.

翻译：我们提出LLaMA-Adapter——一种轻量级适配方法，用于将LLaMA高效微调为指令跟随模型。该方法仅需使用5.2万条自指令演示数据，在冻结的LLaMA 7B模型上引入120万可学习参数，并在8块A100 GPU上花费不到一小时完成微调。具体而言，我们采用一组可学习的适配提示，将其预置到高层Transformer层的词元前；接着提出带零门控的零初始化注意力机制，既能自适应地将新指令线索注入LLaMA，又能有效保留其预训练知识。通过高效训练，LLaMA-Adapter可生成与全参数微调70亿参数的Alpaca相媲美的高质量回复。除语言指令外，本方法可简单扩展至多模态指令，用于学习图像条件化的LLaMA模型，在ScienceQA和COCO Caption基准测试中展现出卓越的推理性能。此外，我们还在传统视觉与语言任务上评估了零初始化注意力机制对其他预训练模型（ViT、RoBERTa）的微调效果，证明了本方法优越的泛化能力。代码已开源至https://github.com/OpenGVLab/LLaMA-Adapter。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日