Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.
翻译:大型基础模型正变得无处不在,但从头训练这些模型的成本极其高昂。因此,如何高效地将这些强大模型适配到下游任务变得日益重要。本文研究了一种原则性的微调范式——正交微调(OFT)——用于下游任务适配。尽管OFT展现了良好的泛化性能,但由于正交矩阵的高维性质,其仍然使用了相当多的可训练参数。为解决这一问题,我们从信息传输视角审视OFT,并识别出若干有助于实现更高参数效率的关键特性。受库利-图基快速傅里叶变换算法实现高效信息传输的启发,我们提出了一种基于蝴蝶结构的高效正交参数化方法。将该参数化方法应用于OFT,我们创建了一种新型参数高效微调方法,称为正交蝴蝶(BOFT)。BOFT将OFT作为特例纳入其中,构建了广义正交微调框架。最后,我们开展了大量实证研究,将大型视觉Transformer、大型语言模型和文生图扩散模型适配到视觉与语言领域的各类下游任务。