LPT: Long-tailed Prompt Tuning for Image Classification

For long-tailed classification, most works often pretrain a big model on a large-scale dataset, and then fine-tune the whole model for adapting to long-tailed data. Though promising, fine-tuning the whole pretrained model tends to suffer from high cost in computation and deployment of different models for different tasks, as well as weakened generalization ability for overfitting to certain features of long-tailed data. To alleviate these issues, we propose an effective Long-tailed Prompt Tuning method for long-tailed classification. LPT introduces several trainable prompts into a frozen pretrained model to adapt it to long-tailed data. For better effectiveness, we divide prompts into two groups: 1) a shared prompt for the whole long-tailed dataset to learn general features and to adapt a pretrained model into target domain; and 2) group-specific prompts to gather group-specific features for the samples which have similar features and also to empower the pretrained model with discrimination ability. Then we design a two-phase training paradigm to learn these prompts. In phase 1, we train the shared prompt via supervised prompt tuning to adapt a pretrained model to the desired long-tailed domain. In phase 2, we use the learnt shared prompt as query to select a small best matched set for a group of similar samples from the group-specific prompt set to dig the common features of these similar samples, then optimize these prompts with dual sampling strategy and asymmetric GCL loss. By only fine-tuning a few prompts while fixing the pretrained model, LPT can reduce training and deployment cost by storing a few prompts, and enjoys a strong generalization ability of the pretrained model. Experiments show that on various long-tailed benchmarks, with only ~1.1% extra parameters, LPT achieves comparable performance than previous whole model fine-tuning methods, and is more robust to domain-shift.

翻译：对于长尾分类问题，现有方法通常在大规模数据集上预训练大型模型，再通过全模型微调适配长尾数据。虽然有效，但全模型微调存在计算成本高、不同任务需部署不同模型、以及因过度适应长尾数据特定特征而弱化泛化能力等问题。为缓解这些难题，我们提出一种有效的长尾提示调优方法。LPT将若干可训练提示嵌入冻结的预训练模型，使其适配长尾数据。为提升效果，我们将提示分为两组：1）面向整个长尾数据集的共享提示，用于学习通用特征并将预训练模型适配至目标域；2）面向特征相似样本的特定分组提示，用于收集分组特有特征并赋予预训练模型判别能力。我们设计两阶段训练范式学习这些提示：第一阶段通过监督提示调优训练共享提示，使预训练模型适配目标长尾域；第二阶段以学习到的共享提示为查询，从分组提示集中为相似样本组选取最优匹配集以挖掘共性特征，并采用双采样策略与非对称广义对比学习损失优化提示。通过仅微调少量提示而冻结预训练模型，LPT仅需存储少量提示即可降低训练与部署成本，同时保留预训练模型的强泛化能力。实验表明，在多种长尾基准测试中，LPT仅需增加约1.1%参数即可达到与全模型微调方法相当的性能，且对域迁移具有更强的鲁棒性。