Aligned Large Language Models (LLMs) showcase remarkable versatility, capable of handling diverse real-world tasks. Meanwhile, aligned LLMs are also expected to exhibit speciality, excelling in specific applications. However, fine-tuning with extra data, a common practice to gain speciality, often leads to catastrophic forgetting (CF) of previously acquired versatility, hindering the model's performance across diverse tasks. In response to this challenge, we propose CoFiTune, a coarse to fine framework in an attempt to strike the balance between speciality and versatility. At the coarse-grained level, an empirical tree-search algorithm is utilized to pinpoint and update specific modules that are crucial for speciality, while keeping other parameters frozen; at the fine-grained level, a soft-masking mechanism regulates the update to the LLMs, mitigating the CF issue without harming speciality. In an overall evaluation of both speciality and versatility, CoFiTune consistently outperforms baseline methods across diverse tasks and model scales. Compared to the full-parameter SFT, CoFiTune leads to about 14% versatility improvement and marginal speciality loss on a 13B model. Lastly, based on further analysis, we provide a speculative insight into the information forwarding process in LLMs, which helps explain the effectiveness of the proposed method. The code is available at https://github.com/rattlesnakey/CoFiTune.
翻译:对齐后的大语言模型展现出卓越的泛化能力,能够处理多样化的现实世界任务。同时,对齐后的大语言模型也被期望具备专精性,即在特定应用场景中表现优异。然而,通过额外数据进行微调——一种获取专精性的常见做法——往往会导致模型对先前习得的泛化能力产生灾难性遗忘,从而损害模型在多样化任务上的性能。为应对这一挑战,我们提出了CoFiTune,一种粗粒度到细粒度的框架,旨在平衡专精性与泛化能力。在粗粒度层面,采用一种经验性树搜索算法来精确定位并更新对专精性至关重要的特定模块,同时冻结其他参数;在细粒度层面,一种软掩码机制调控对大语言模型的更新,在不损害专精性的前提下缓解灾难性遗忘问题。在对专精性与泛化能力的综合评估中,CoFiTune在多样化任务和模型规模上均持续优于基线方法。与全参数监督微调相比,CoFiTune在130亿参数模型上实现了约14%的泛化能力提升,同时专精性损失微乎其微。最后,基于进一步分析,我们对大语言模型中的信息前向传播过程提出了一种推测性见解,这有助于解释所提方法的有效性。代码发布于https://github.com/rattlesnakey/CoFiTune。