Robust fine-tuning aims to achieve competitive in-distribution (ID) performance while maintaining the out-of-distribution (OOD) robustness of a pre-trained model when transferring it to a downstream task. Recently, projected gradient descent has been successfully used in robust fine-tuning by constraining the deviation from the initialization of the fine-tuned model explicitly through projection. However, algorithmically, two limitations prevent this method from being adopted more widely, scalability and efficiency. In this paper, we propose a new projection-based fine-tuning algorithm, Fast Trainable Projection (FTP) for computationally efficient learning of per-layer projection constraints, resulting in an average $35\%$ speedup on our benchmarks compared to prior works. FTP can be combined with existing optimizers such as AdamW, and be used in a plug-and-play fashion. Finally, we show that FTP is a special instance of hyper-optimizers that tune the hyper-parameters of optimizers in a learnable manner through nested differentiation. Empirically, we show superior robustness on OOD datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, we demonstrate that FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings thanks to its easy adaptability. The code will be available at https://github.com/GT-RIPL/FTP.git.
翻译:鲁棒微调旨在将预训练模型迁移至下游任务时,在保持分布外鲁棒性的同时实现具有竞争力的分布内性能。最近,投影梯度下降通过投影显式约束微调模型相对于初始化的偏离,成功应用于鲁棒微调。然而,从算法角度看,该方法存在两大局限性:可扩展性与效率问题,阻碍了其更广泛的应用。本文提出一种新的基于投影的微调算法——快速可训练投影,用于实现逐层投影约束的计算高效学习,相较于现有工作,在我们的基准测试中平均获得35%的加速。FTP可与AdamW等现有优化器结合使用,并以即插即用方式应用。最后,我们证明FTP是超优化器的一个特例,通过嵌套微分以可学习方式调整优化器的超参数。实验表明,在包含领域偏移和自然损坏的分布外数据集上,FTP在四个不同视觉任务及五种不同预训练模型上展现出卓越的鲁棒性。此外,由于易于适应性,FTP可广泛适用于低标签学习和持续学习等其他学习场景。代码将发布在https://github.com/GT-RIPL/FTP.git。