We present GIFT (Generative Interpretable Fine-tuning Transformers) for fine-tuning pretrained (often large) Transformer models at downstream tasks in a parameter-efficient way with built-in interpretability. Our GIFT is a deep parameter-residual learning method, which addresses two problems in fine-tuning a pretrained Transformer model: Where to apply the parameter-efficient fine-tuning (PEFT) to be extremely lightweight yet sufficiently expressive, and How to learn the PEFT to better exploit the knowledge of the pretrained model in a direct way? For the former, we select the final projection (linear) layer in the multi-head self-attention of a Transformer model, and verify its effectiveness. For the latter, in contrast to the prior art that directly introduce new model parameters (often in low-rank approximation form) to be learned in fine-tuning with downstream data, we propose a method for learning to generate the fine-tuning parameters. Our GIFT is a hyper-Transformer which take as input the pretrained parameters of the projection layer to generate its fine-tuning parameters using a proposed Parameter-to-Cluster Attention (PaCa). The PaCa results in a simple clustering-based forward explainer that plays the role of semantic segmentation in testing. In experiments, our proposed GIFT is tested on the VTAB benchmark and the fine-grained visual classification (FGVC) benchmark. It obtains significantly better performance than the prior art. Our code is available at https://github.com/savadikarc/gift
翻译:我们提出GIFT(生成式可解释微调Transformer),旨在以参数高效且内置可解释性的方式,对预训练(通常为大规模)Transformer模型在下游任务中进行微调。GIFT是一种深度参数残差学习方法,它解决了预训练Transformer模型微调中的两个问题:在何处应用参数高效微调(PEFT)以实现极致轻量级且具备足够表达能力,以及如何学习PEFT以更直接地利用预训练模型的知识?针对前者,我们选取Transformer模型多头自注意力中的最终投影(线性)层,并验证其有效性。针对后者,与现有技术直接引入新模型参数(通常以低秩近似形式)并利用下游数据进行微调学习不同,我们提出一种学习生成微调参数的方法。我们的GIFT是一个超Transformer,它以投影层的预训练参数作为输入,通过提出的参数到聚类注意力(PaCa)生成其微调参数。PaCa产生一个基于聚类的简单前向解释器,在测试中扮演语义分割的角色。实验中,所提出的GIFT在VTAB基准和细粒度视觉分类(FGVC)基准上进行了测试。它获得了显著优于现有技术的性能。我们的代码可在 https://github.com/savadikarc/gift 获取。