In this paper we generalize and extend an idea of low-rank adaptation (LoRA) of large language models (LLMs) based on Transformer architecture. Widely used LoRA-like methods of fine-tuning LLMs are based on matrix factorization of gradient update. We introduce LoTR, a novel approach for parameter-efficient fine-tuning of LLMs which represents a gradient update to parameters in a form of tensor decomposition. Low-rank adapter for each layer is constructed as a product of three matrices, and tensor structure arises from sharing left and right multipliers of this product among layers. Simultaneous compression of a sequence of layers with low-rank tensor representation allows LoTR to archive even better parameter efficiency then LoRA especially for deep models. Moreover, the core tensor does not depend on original weight dimension and can be made arbitrary small, which allows for extremely cheap and fast downstream fine-tuning.
翻译:本文对基于Transformer架构的大语言模型(LLMs)的低秩适配(LoRA)思想进行了推广与扩展。当前广泛使用的LoRA类微调方法基于梯度更新矩阵分解。我们提出LoTR——一种面向LLMs的参数高效微调新方法,该方法将参数梯度更新表示为张量分解形式。每层的低秩适配器由三个矩阵的乘积构成,而通过在各层间共享该乘积的左右乘子,自然形成张量结构。采用低秩张量表示对多层进行联合压缩,使得LoTR相比LoRA实现了更优的参数效率,尤其在深度模型中效果显著。此外,核心张量不依赖于原始权重维度且可任意缩小,从而支持极轻量且快速的后续下游微调。