Low-Rank Adaptation (LoRA) provides an effective yet efficient solution for fine-tuning large language models (LLM). The modular and plug-and-play nature of LoRA enables the integration of diverse domain-specific LoRAs to enhance the capabilities of LLMs. Previous research on exploiting multiple LoRAs either focuses on specific isolated downstream tasks or fixes the selection of LoRAs during training. However, in real-world scenarios, LLMs receive diverse prompts covering different tasks, and the pool of candidate LoRAs is often dynamically updated. To bridge this gap, we propose LoraRetriever, a retrieve-then-compose framework that adaptively retrieves and composes multiple LoRAs according to the input prompts. LoraRetriever contains three main components: firstly, identifying and retrieving LoRAs relevant to the given input; secondly, formulating strategies for effectively integrating the retrieved LoRAs; and thirdly, developing efficient batch inference to accommodate heterogeneous requests. Experimental results indicate that LoraRetriever consistently outperforms the baselines, highlighting its practical effectiveness and versatility.
翻译:低秩适配(LoRA)为大型语言模型(LLM)微调提供了一种高效且有效的解决方案。LoRA模块化与即插即用的特性使其能够集成多种特定领域的LoRA模块,从而增强LLM的能力。以往关于多LoRA协同的研究要么聚焦于特定的孤立下游任务,要么在训练阶段固定LoRA的选择。然而在真实场景中,LLM需要处理覆盖不同任务的多样化提示,且候选LoRA模块池常处于动态更新状态。为填补这一空白,我们提出LoraRetriever框架,该框架基于"检索-组合"范式,能根据输入提示自适应地检索并组合多个LoRA模块。LoraRetriever包含三个核心组件:其一,识别并检索与给定输入相关的LoRA模块;其二,制定有效整合已检索LoRA的策略;其三,开发高效批量推理以支持异构请求。实验结果表明,LoraRetriever在各项基准测试中持续取得更优表现,彰显其实际应用中的有效性与泛用性。