Recent studies on transfer learning have shown that selectively fine-tuning a subset of layers or customizing different learning rates for each layer can greatly improve robustness to out-of-distribution (OOD) data and retain generalization capability in the pre-trained models. However, most of these methods employ manually crafted heuristics or expensive hyper-parameter searches, which prevent them from scaling up to large datasets and neural networks. To solve this problem, we propose Trainable Projected Gradient Method (TPGM) to automatically learn the constraint imposed for each layer for a fine-grained fine-tuning regularization. This is motivated by formulating fine-tuning as a bi-level constrained optimization problem. Specifically, TPGM maintains a set of projection radii, i.e., distance constraints between the fine-tuned model and the pre-trained model, for each layer, and enforces them through weight projections. To learn the constraints, we propose a bi-level optimization to automatically learn the best set of projection radii in an end-to-end manner. Theoretically, we show that the bi-level optimization formulation could explain the regularization capability of TPGM. Empirically, with little hyper-parameter search cost, TPGM outperforms existing fine-tuning methods in OOD performance while matching the best in-distribution (ID) performance. For example, when fine-tuned on DomainNet-Real and ImageNet, compared to vanilla fine-tuning, TPGM shows $22\%$ and $10\%$ relative OOD improvement respectively on their sketch counterparts. Code is available at \url{https://github.com/PotatoTian/TPGM}.
翻译:近期关于迁移学习的研究表明,选择性微调部分层或为每层定制不同学习率,可显著提升对分布外数据的鲁棒性,并保持预训练模型的泛化能力。然而,这些方法大多依赖手工设计的启发式规则或昂贵的超参数搜索,难以扩展至大型数据集和神经网络。为解决此问题,我们提出可训练投影梯度方法(TPGM),自动学习每层的约束以实现细粒度微调正则化。该方法的灵感源于将微调建模为双层约束优化问题。具体而言,TPGM为每层维护一组投影半径(即微调模型与预训练模型之间的距离约束),并通过权重投影强制执行这些约束。为学习约束,我们提出一种双层优化方法,以端到端方式自动学习最优投影半径集。理论分析表明,该双层优化公式可解释TPGM的正则化能力。实验结果显示,在超参数搜索成本极低的条件下,TPGM在分布外性能上优于现有微调方法,同时保持最优的分布内性能。例如,在DomainNet-Real和ImageNet上微调后,相较于普通微调,TPGM在其对应的草图数据集上分别实现了22%和10%的相对分布外性能提升。代码已开源至 \url{https://github.com/PotatoTian/TPGM}。