Low-rank adaptation~(LoRA) has recently gained much interest in fine-tuning foundation models. It effectively reduces the number of trainable parameters by incorporating low-rank matrices $A$ and $B$ to represent the weight change, i.e., $\Delta W=BA$. Despite LoRA's progress, it faces storage challenges when handling extensive customization adaptations or larger base models. In this work, we aim to further compress trainable parameters by enjoying the powerful expressiveness of the Fourier transform. Specifically, we introduce FourierFT, which treats $\Delta W$ as a matrix in the spatial domain and learns only a small fraction of its spectral coefficients. With the trained spectral coefficients, we implement the inverse discrete Fourier transform to recover $\Delta W$. Empirically, our FourierFT method shows comparable or better performance with fewer parameters than LoRA on various tasks, including natural language understanding, natural language generation, instruction tuning, and image classification. For example, when performing instruction tuning on the LLaMA2-7B model, FourierFT surpasses LoRA with only 0.064M trainable parameters, compared to LoRA's 33.5M. Our code is released at \url{https://github.com/Chaos96/fourierft}.
翻译:低秩适配(LoRA)近期在基础模型微调领域引起了广泛关注。该方法通过引入低秩矩阵 $A$ 和 $B$ 表示权重变化(即 $\Delta W=BA$),有效减少了可训练参数量。尽管LoRA取得进展,但在处理大规模定制化适配或更大规模基础模型时仍面临存储挑战。本文旨在利用傅里叶变换的强大表达能力进一步压缩可训练参数。具体而言,我们提出FourierFT方法,将 $\Delta W$ 视为空间域矩阵,仅学习其少量频谱系数。利用训练后的频谱系数,通过逆离散傅里叶变换恢复 $\Delta W$。实验表明,在自然语言理解、自然语言生成、指令微调及图像分类等多项任务中,我们的FourierFT方法能以更少参数取得与LoRA相当或更优的性能。例如,在对LLaMA2-7B模型进行指令微调时,FourierFT仅需0.064M可训练参数即超越使用33.5M参数的LoRA方法。我们的代码已开源至 \url{https://github.com/Chaos96/fourierft}。