Parameter-Efficient Fine-Tuning (PEFT) methods, such as Low-Rank Adaptation (LoRA), have significantly reduced the number of trainable parameters needed in fine-tuning large language models (LLMs). The developments of LoRA-style adapters have considered two main directions: (1) enhancing model expressivity with high-rank adapters, and (2) aiming for further parameter reduction, as exemplified by vector-based methods. However, these approaches come with a trade-off, as achieving the expressivity of high-rank weight updates typically comes at the cost of sacrificing the extreme parameter efficiency offered by vector-based techniques. To address this issue, we propose a vector-based random Tensor network for high-Rank Adaptation (TeRA), a novel PEFT method that achieves high-rank weight updates while retaining the parameter efficiency of vector-based PEFT adapters. This is achieved by parametrizing the tensorized weight update matrix as a Tucker-like tensor network (TN), whereby large randomly initialized factors are frozen and shared across layers, while only small layer-specific scaling vectors, corresponding to diagonal entries of factor matrices, are trained. Comprehensive experiments demonstrate that TeRA matches or even outperforms existing high-rank adapters, while requiring as few trainable parameters as vector-based methods. Theoretical analysis and ablation studies validate the effectiveness of the proposed TeRA method. The code is available at https://github.com/guyuxuan9/TeRA.
翻译:参数高效微调(PEFT)方法(如低秩适配LoRA)显著减少了微调大语言模型(LLM)时所需的可训练参数量。LoRA风格适配器的发展主要集中于两个方向:(1)通过高秩适配器增强模型表达能力;(2)通过基于向量的方法进一步减少参数。然而,这些方法存在折中——实现高秩权重更新的表达能力通常以牺牲基于向量技术带来的极致参数效率为代价。为解决该问题,我们提出一种基于向量的随机张量网络高秩适配方法(TeRA),这是一种新颖的PEFT方法,能在保持基于向量PEFT适配器参数效率的同时实现高秩权重更新。其核心思想是将张量化权重更新矩阵参数化为类Tucker张量网络(TN),其中随机初始化的大型因子被冻结并在层间共享,仅需训练对应因子矩阵对角项的小型逐层缩放向量。综合实验表明,TeRA在匹配甚至超越现有高秩适配器性能的同时,所需可训练参数量与基于向量的方法相当。理论分析与消融研究验证了所提TeRA方法的有效性。代码开源于https://github.com/guyuxuan9/TeRA。