Parameter-efficient tuning (PET) has been widely explored in recent years because it tunes much fewer parameters (PET modules) than full-parameter fine-tuning (FT) while still stimulating sufficient knowledge from large language models (LLMs) for downstream tasks. Moreover, when PET is employed to serve multiple tasks, different task-specific PET modules can be built on a frozen LLM, avoiding redundant LLM deployments. Although PET significantly reduces the cost of tuning and deploying LLMs, its inference still suffers from the computational bottleneck of LLMs. To address the above issue, we propose an effective PET framework based on compressed LLMs, named "CPET". In CPET, we evaluate the impact of mainstream LLM compression techniques on PET performance and then introduce knowledge inheritance and recovery strategies to restore the knowledge loss caused by these compression techniques. Our experimental results demonstrate that, owing to the restoring strategies of CPET, collaborating task-specific PET modules with a compressed LLM can achieve comparable performance to collaborating PET modules with the original version of the compressed LLM and outperform directly applying vanilla PET methods to the compressed LLM.
翻译:参数高效微调(PET)近年来被广泛研究,因为它仅需调整远少于全参数微调(FT)的参数(即PET模块),仍能有效激发大语言模型(LLM)中蕴含的充分知识以适配下游任务。此外,当PET被用于服务多任务场景时,可在冻结的LLM基础上构建不同的任务专用PET模块,从而避免冗余的LLM部署。尽管PET显著降低了LLM的微调与部署成本,但其推理过程仍受限于LLM的计算瓶颈。针对此问题,我们提出了一种基于压缩LLM的高效PET框架,命名为"CPET"。在CPET中,我们评估了主流LLM压缩技术对PET性能的影响,进而引入知识继承与恢复策略,以弥补由这些压缩技术导致的知识损失。实验结果表明,得益于CPET的恢复策略,将任务专用PET模块与压缩后的LLM协同使用,可达到与将PET模块与原版压缩LLM协同使用相当的性能,并优于将原始PET方法直接应用于压缩LLM的方案。