Efficient Prompt Tuning of Large Vision-Language Model for Fine-Grained Ship Classification

Fine-grained ship classification in remote sensing (RS-FGSC) poses a significant challenge due to the high similarity between classes and the limited availability of labeled data, limiting the effectiveness of traditional supervised classification methods. Recent advancements in large pre-trained Vision-Language Models (VLMs) have demonstrated impressive capabilities in few-shot or zero-shot learning, particularly in understanding image content. This study delves into harnessing the potential of VLMs to enhance classification accuracy for unseen ship categories, which holds considerable significance in scenarios with restricted data due to cost or privacy constraints. Directly fine-tuning VLMs for RS-FGSC often encounters the challenge of overfitting the seen classes, resulting in suboptimal generalization to unseen classes, which highlights the difficulty in differentiating complex backgrounds and capturing distinct ship features. To address these issues, we introduce a novel prompt tuning technique that employs a hierarchical, multi-granularity prompt design. Our approach integrates remote sensing ship priors through bias terms, learned from a small trainable network. This strategy enhances the model's generalization capabilities while improving its ability to discern intricate backgrounds and learn discriminative ship features. Furthermore, we contribute to the field by introducing a comprehensive dataset, FGSCM-52, significantly expanding existing datasets with more extensive data and detailed annotations for less common ship classes. Extensive experimental evaluations demonstrate the superiority of our proposed method over current state-of-the-art techniques. The source code will be made publicly available.

翻译：遥感图像中的细粒度船舶分类在类别间高度相似和标注数据有限的情况下面临显著挑战，限制了传统监督分类方法的有效性。近年来，大规模预训练视觉语言模型在少样本或零样本学习领域展现出卓越能力，尤其擅长理解图像内容。本研究深入探索利用视觉语言模型的潜力来提升未见船舶类别的分类精度，这在因成本或隐私限制导致数据受限的场景中具有重要价值。直接对视觉语言模型进行微调处理遥感细粒度船舶分类时，常面临针对已见类别的过拟合问题，导致对未见类别的泛化能力欠佳，凸显了区分复杂背景与捕捉船舶独特特征的困难。为解决这些问题，我们提出一种新颖的提示调优技术，采用分层多粒度提示设计。该方法通过小型可训练网络学习的偏置项引入遥感船舶先验知识，在提升模型泛化能力的同时增强对复杂背景的辨识力与判别性船舶特征的学习能力。此外，我们贡献了一个综合数据集FGSCM-52，以更丰富的数据和针对低出现率船舶类别的详细标注大幅扩展现有数据集。大量实验评估表明，所提方法在性能上优于当前最先进技术。源代码将公开提供。