In this paper, we present a semi-supervised fine-tuning approach designed to improve the performance of pre-trained foundation models on downstream tasks with limited labeled data. By leveraging content-style decomposition within an information-theoretic framework, our method enhances the latent representations of pre-trained vision foundation models, aligning them more effectively with specific task objectives and addressing the problem of distribution shift. We evaluate our approach on multiple datasets, including MNIST, its augmented variations (with yellow and white stripes), CIFAR-10, SVHN, and GalaxyMNIST. The experiments show improvements over supervised finetuning baseline of pre-trained models, particularly in low-labeled data regimes, across both frozen and trainable backbones for the majority of the tested datasets.
翻译:本文提出一种半监督微调方法,旨在提升预训练基础模型在标注数据有限的下游任务上的性能。通过在信息论框架中利用内容-风格分解,我们的方法增强了预训练视觉基础模型的潜在表示,使其更有效地与特定任务目标对齐,并解决了分布偏移问题。我们在多个数据集上评估了该方法,包括MNIST及其增强变体(含黄白条纹)、CIFAR-10、SVHN和GalaxyMNIST。实验结果表明,在多数测试数据集上,无论是冻结骨干网络还是可训练骨干网络,该方法在低标注数据场景下均优于预训练模型的监督微调基线。