Balancing Speciality and Versatility: a Coarse to Fine Framework for Supervised Fine-tuning Large Language Model

Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's performance across diverse tasks. In response to this challenge, we propose CoFiTune, a coarse to fine framework in an attempt to strike the balance between speciality and versatility. At the coarse-grained level, an empirical tree-search algorithm is utilized to pinpoint and update specific modules that are crucial for speciality, while keeping other parameters frozen; at the fine-grained level, a soft-masking mechanism regulates the update to the LLMs, mitigating the CF issue without harming speciality. In an overall evaluation of both speciality and versatility, CoFiTune consistently outperforms baseline methods across diverse tasks and model scales. Compared to the full-parameter SFT, CoFiTune leads to about 14% versatility improvement and marginal speciality loss on a 13B model. Lastly, based on further analysis, we provide a speculative insight into the information forwarding process in LLMs, which helps explain the effectiveness of the proposed method. The code is available at https://github.com/rattlesnakey/CoFiTune.

翻译：对齐后的大语言模型展现出卓越的泛化能力，能够处理多样化的现实世界任务。同时，对齐后的大语言模型也被期望具备专精性，即在特定应用场景中表现优异。然而，通过额外数据进行微调——一种获取专精性的常见做法——往往会导致模型对先前习得的泛化能力产生灾难性遗忘，从而损害模型在多样化任务上的性能。为应对这一挑战，我们提出了CoFiTune，一种粗粒度到细粒度的框架，旨在平衡专精性与泛化能力。在粗粒度层面，采用一种经验性树搜索算法来精确定位并更新对专精性至关重要的特定模块，同时冻结其他参数；在细粒度层面，一种软掩码机制调控对大语言模型的更新，在不损害专精性的前提下缓解灾难性遗忘问题。在对专精性与泛化能力的综合评估中，CoFiTune在多样化任务和模型规模上均持续优于基线方法。与全参数监督微调相比，CoFiTune在130亿参数模型上实现了约14%的泛化能力提升，同时专精性损失微乎其微。最后，基于进一步分析，我们对大语言模型中的信息前向传播过程提出了一种推测性见解，这有助于解释所提方法的有效性。代码发布于https://github.com/rattlesnakey/CoFiTune。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日