Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years. Though these big models have zero-shot capabilities, in general, labeled data are still required to adapt them to downstream tasks. To overcome this critical limitation, we propose an unsupervised fine-tuning framework to directly fine-tune the model or prompt on the unlabeled target data. We demonstrate how to apply our method to both language-augmented vision and masked-language models by aligning the discrete distributions extracted from the prompts and target data. To verify our approach's applicability, we conduct extensive experiments on image classification, sentiment analysis, and natural language inference tasks. Across 13 image-related tasks and 15 language-related ones, the proposed approach achieves consistent improvements over the baselines.
翻译:通过提示学习,大规模预训练模型展现出更强的表达能力和性能,近年来受到广泛关注。尽管这些大型模型具备零样本能力,但通常仍需标注数据才能适配下游任务。为克服这一关键限制,我们提出了一种无监督微调框架,可直接在无标注目标数据上对模型或提示进行微调。我们通过对齐从提示和目标数据中提取的离散分布,演示了如何将本方法应用于语言增强视觉模型和掩码语言模型。为验证方法的适用性,我们在图像分类、情感分析和自然语言推理任务上进行了广泛实验。在13项图像相关任务和15项语言相关任务中,所提方法均持续优于基线模型。