Parameter efficient finetuning (PEFT) methods are widely used in LLMs and generative models in computer vision. Especially one can use multiple of these during inference to change the behavior of the base model. In this paper we investigated whether multiple LoRA adapters trained on computer vision tasks can be merged together and used during inference without loss in performance. By achieving this, multitask models can be created just by merging different LoRAs. Merging these will reduce inference time and it will not require any additional retraining. We have trained adapters on six different tasks and evaluated their performance when they are merged together. For comparison we used a model with a frozen backbone and finetuned its head. Our results show that even with simple merging techniques creating a multitask model by merging adapters is achievable by slightly loosing performance in some cases. In our experiments we merged up to three adapters together. Depending on the task and the similarity of the data adapters were trained on, merges can outperform head finetuning. We have observed that LoRAs trained with dissimilar datasets tend to perform better compared to model trained on similar datasets.
翻译:参数高效微调(PEFT)方法在大型语言模型和计算机视觉生成模型中已得到广泛应用。特别是在推理阶段,可以同时使用多个此类适配器来改变基础模型的行为。本文研究了针对计算机视觉任务训练的多个LoRA适配器是否能够在合并后用于推理,且不损失性能。通过实现这一目标,仅需合并不同的LoRA适配器即可构建多任务模型。合并操作将减少推理时间,且无需任何额外的重新训练。我们在六项不同任务上训练了适配器,并评估了它们合并后的性能。为进行比较,我们使用了具有冻结主干网络并仅微调其头部的模型作为基准。实验结果表明,即使采用简单的合并技术,通过合并适配器构建多任务模型是可行的,仅在部分情况下会伴随轻微的性能损失。在我们的实验中,最多合并了三个适配器。根据任务特性及适配器训练数据的相似性,合并后的模型性能可能超越头部微调模型。我们观察到,基于差异较大数据集训练的LoRA适配器在合并后,其性能往往优于基于相似数据集训练的模型。