In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor decomposition. Low-rank adapter for each layer is constructed as a product of three matrices, and tensor structure arises from sharing left and right multipliers of this product among layers. Simultaneous compression of a sequence of layers with low-rank tensor representation allows LoTR to archive even better parameter efficiency then LoRA especially for deep models. Moreover, the core tensor does not depend on original weight dimension and can be made arbitrary small, which allows for extremely cheap and fast downstream fine-tuning.
翻译:本文基于Transformer架构,对大型语言模型(LLMs)的低秩适配(LoRA)思想进行了泛化与拓展。当前广泛使用的类LoRA微调方法基于梯度更新的矩阵分解。我们提出LoTR——一种面向LLMs的参数高效微调新方法,该方法将参数梯度更新表示为张量分解形式。每层的低秩适配器由三个矩阵的乘积构成,通过在各层间共享该乘积的左乘因子与右乘因子,张量结构得以形成。采用低秩张量表示对连续层序列进行联合压缩,使得LoTR相较LoRA实现了更优的参数效率——尤其在深层模型中表现突出。此外,核心张量不依赖于原始权重维度且可任意缩小,从而支持极低成本与快速的下游微调。