Multi-Task Learning (MTL) combined with Low-Rank Adaptation (LoRA) has emerged as a promising direction for parameter-efficient deployment of Large Language Models (LLMs). By sharing a single adapter across multiple tasks, one can significantly reduce storage overhead. However, this approach suffers from negative transfer, where conflicting gradient updates from distinct tasks degrade the performance of individual tasks compared to single-task fine-tuning. This problem is exacerbated in LoRA due to the low-rank constraint, which limits the optimization landscape's capacity to accommodate diverse task requirements. In this paper, we propose Ortho-LoRA, a gradient projection method specifically tailored for the bipartite structure of LoRA. Ortho-LoRA dynamically projects conflicting task gradients onto the orthogonal complement of each other within the intrinsic LoRA subspace. Extensive experiments on the GLUE benchmark demonstrate that Ortho-LoRA effectively mitigates task interference, outperforming standard joint training and recovering 95\% of the performance gap between multi-task and single-task baselines with negligible computational overhead.
翻译:多任务学习(MTL)与低秩自适应(LoRA)相结合,已成为大型语言模型(LLM)参数高效部署的一个有前景的方向。通过在多个任务间共享单一适配器,可以显著降低存储开销。然而,这种方法存在负迁移问题,即来自不同任务的冲突梯度更新会降低各任务的性能,相较于单任务微调。由于低秩约束限制了优化空间容纳多样化任务需求的能力,该问题在LoRA中尤为突出。本文提出Ortho-LoRA,一种专门针对LoRA二分结构设计的梯度投影方法。Ortho-LoRA在LoRA的固有子空间内,动态地将冲突的任务梯度投影到彼此的正交补空间上。在GLUE基准测试上的大量实验表明,Ortho-LoRA能有效缓解任务干扰,其性能优于标准联合训练,并以可忽略的计算开销恢复了多任务与单任务基线之间95%的性能差距。