The visual models pretrained on large-scale benchmarks encode general knowledge and prove effective in building more powerful representations for downstream tasks. Most existing approaches follow the fine-tuning paradigm, either by initializing or regularizing the downstream model based on the pretrained one. The former fails to retain the knowledge in the successive fine-tuning phase, thereby prone to be over-fitting, and the latter imposes strong constraints to the weights or feature maps of the downstream model without considering semantic drift, often incurring insufficient optimization. To deal with these issues, we propose a novel fine-tuning framework, namely distribution regularization with semantic calibration (DR-Tune). It employs distribution regularization by enforcing the downstream task head to decrease its classification error on the pretrained feature distribution, which prevents it from over-fitting while enabling sufficient training of downstream encoders. Furthermore, to alleviate the interference by semantic drift, we develop the semantic calibration (SC) module to align the global shape and class centers of the pretrained and downstream feature distributions. Extensive experiments on widely used image classification datasets show that DR-Tune consistently improves the performance when combing with various backbones under different pretraining strategies. Code is available at: https://github.com/weeknan/DR-Tune.
翻译:在大规模基准上预训练的视觉模型编码了通用知识,并被证明能为下游任务构建更强大的表示。现有方法大多遵循微调范式,或基于预训练模型初始化下游模型,或对其进行正则化。前者在后续微调阶段难以保留知识,容易过拟合;后者对下游模型的权重或特征图施加强约束,未考虑语义漂移,常导致优化不充分。为解决这些问题,我们提出了一种新型微调框架,即带有语义校准的分布正则化(DR-Tune)。它通过强制下游任务头在预训练特征分布上降低分类误差来实现分布正则化,从而防止过拟合并允许下游编码器充分训练。此外,为减轻语义漂移的干扰,我们开发了语义校准(SC)模块,用于对齐预训练与下游特征分布的全局形状和类中心。在广泛使用的图像分类数据集上进行的大量实验表明,DR-Tune在结合不同预训练策略下的各种骨干网络时能持续提升性能。代码地址:https://github.com/weeknan/DR-Tune。