Fine-tuning large pre-trained vision foundation models in a parameter-efficient manner is critical for downstream vision tasks, considering the practical constraints of computational and storage costs. Low-rank adaptation (LoRA) is a well-established technique in this domain, achieving impressive efficiency by reducing the parameter space to a low-rank form. However, developing more advanced low-rank adaptation methods to reduce parameters and memory requirements remains a significant challenge in resource-constrained application scenarios. In this study, we consider on top of the commonly used vision transformer and propose Serial LoRA, a novel LoRA variant that introduces a shared low-rank matrix serially composite with the attention mechanism. Such a design extracts the underlying commonality of parameters in adaptation, significantly reducing redundancy. Notably, Serial LoRA uses only 1/4 parameters of LoRA but achieves comparable performance in most cases. We conduct extensive experiments on a range of vision foundation models with the transformer structure, and the results confirm consistent superiority of our method.
翻译:考虑到计算与存储成本的实际限制,以参数高效的方式微调大型预训练视觉基础模型对于下游视觉任务至关重要。低秩适配(LoRA)是该领域一项成熟的技术,通过将参数空间降至低秩形式实现了显著的效率提升。然而,在资源受限的应用场景中,开发更先进的低秩适配方法以减少参数和内存需求仍是一个重大挑战。本研究基于广泛使用的视觉Transformer,提出序列LoRA——一种新颖的LoRA变体,该方法引入共享低秩矩阵与注意力机制进行序列复合。这种设计提取了适配过程中参数的潜在共性,显著降低了冗余度。值得注意的是,序列LoRA仅使用LoRA 1/4的参数量,却在多数情况下实现了相当的性能。我们在多种采用Transformer结构的视觉基础模型上进行了广泛实验,结果证实了该方法具有一致的优越性。