LoRA achieves remarkable resource efficiency and comparable performance when adapting LLMs for specific tasks. Since ChatGPT demonstrated superior performance on various tasks, there has been a growing desire to adapt one model for all tasks. However, the explicit low-rank of LoRA limits the adaptation performance in complex multi-task scenarios. LoRA is dominated by a small number of top singular vectors while fine-tuning decomposes into a set of less important unitary transforms. In this paper, we propose MultiLoRA for better multi-task adaptation by reducing the dominance of top singular vectors observed in LoRA. MultiLoRA scales LoRA modules horizontally and change parameter initialization of adaptation matrices to reduce parameter dependency, thus yields more balanced unitary subspaces. We unprecedentedly construct specialized training data by mixing datasets of instruction follow, natural language understanding, world knowledge, to cover semantically and syntactically different samples. With only 2.5% of additional parameters, MultiLoRA outperforms single LoRA counterparts and fine-tuning on multiple benchmarks and model scales. Further investigation into weight update matrices of MultiLoRA exhibits reduced dependency on top singular vectors and more democratic unitary transform contributions.
翻译:LoRA在针对特定任务适配大语言模型时展现了卓越的资源效率和可比性能。自ChatGPT在各类任务中表现出众以来,人们愈发渴望用单一模型适配所有任务。然而,LoRA显式的低秩性限制了其在复杂多任务场景中的适配性能——LoRA受少量顶级奇异向量主导,而微调则分解为一组重要性较低的单位变换。本文提出MultiLoRA,通过削弱LoRA中顶级奇异向量的主导性来实现更优的多任务适配。MultiLoRA水平扩展LoRA模块,并改变适配矩阵的参数初始化方式以降低参数依赖性,从而生成更均衡的单位子空间。我们史无前例地通过混合指令遵循、自然语言理解、世界知识数据集构建专用训练数据,覆盖语义与句法各异的样本。仅使用2.5%的额外参数,MultiLoRA在多个基准测试和模型规模上超越单一LoRA对应模型及全参微调。对MultiLoRA权重更新矩阵的进一步研究显示,其对顶级奇异向量的依赖性降低,且单位变换贡献呈现更民主化分布。