Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.
翻译:大型基础模型日益普及,但从头训练这些模型成本过高。因此,如何高效地将这些强大模型适配到下游任务变得愈发重要。本文研究了一种面向下游任务适配的原则性微调范式——正交微调(OFT)。尽管OFT展现出良好的泛化能力,但由于正交矩阵的高维特性,其仍需使用相当多的可训练参数。为解决这一问题,我们首先从信息传输视角审视OFT,进而识别出实现更高参数效率的几个关键需求。受库利-图基快速傅里叶变换算法实现高效信息传输的启发,我们提出利用蝶形结构实现高效正交参数化方法。将该参数化方法应用于OFT,我们创建了一种新颖的参数高效微调方法——正交蝴蝶(BOFT)。通过将OFT作为特例纳入,BOFT构建了广义正交微调框架。最后,我们通过大规模实验研究,将大型视觉Transformer、大型语言模型以及文本到图像扩散模型适配到视觉和语言领域的多种下游任务中。