Multimodal large language models (MLLMs) fine-tuned with multimodal instruction datasets have demonstrated remarkable capabilities in multimodal tasks. However, fine-tuning all parameters of MLLMs has become challenging as they usually contain billions of parameters. To address this issue, we study parameter-efficient fine-tuning (PEFT) methods for MLLMs. We aim to identify effective methods for enhancing the performance of MLLMs in scenarios where only a limited number of parameters are trained. This paper conducts empirical studies using four popular PEFT methods to fine-tune the LLM component of open-source MLLMs. We present a comprehensive analysis that encompasses various aspects, including the impact of PEFT methods on various models, parameters and location of the PEFT module, size of fine-tuning data, model stability based on PEFT methods, MLLM's generalization, and hallucination. We evaluated four PEFT methods on seven datasets from two different categories: unseen and seen datasets. Across all experiments, we show that the adapter is the best-performing PEFT method. At the same time, fine-tuning the connector layers leads to improved performance in most MLLMs. Code and data are available at https://github.com/alenai97/PEFT-MLLM.git.
翻译:通过多模态指令数据集微调的多模态大语言模型(MLLMs)已在多模态任务中展现出卓越的能力。然而,由于这类模型通常包含数十亿参数,对所有参数进行微调已变得颇具挑战。为解决此问题,我们研究了适用于MLLMs的参数高效微调(PEFT)方法。我们的目标是在仅训练有限数量参数的场景下,找到提升MLLM性能的有效方法。本文采用四种主流PEFT方法对开源MLLMs中的大语言模型(LLM)组件进行微调,并开展实证研究。我们提供了涵盖多方面的综合分析,包括:PEFT方法对不同模型的影响、PEFT模块的参数与位置设置、微调数据规模、基于PEFT方法的模型稳定性、MLLM的泛化能力以及幻觉问题。我们在来自两个不同类别(未见数据集与已见数据集)的七个数据集上评估了四种PEFT方法。所有实验结果表明,适配器(adapter)是性能最优的PEFT方法。同时,对连接器层进行微调可在大多数MLLMs中带来性能提升。代码与数据公开于 https://github.com/alenai97/PEFT-MLLM.git。