Generalized Category Discovery (GCD) aims to classify unlabelled images from both `seen' and `unseen' classes by transferring knowledge from a set of labelled `seen' class images. A key theme in existing GCD approaches is adapting large-scale pre-trained models for the GCD task. An alternate perspective, however, is to adapt the data representation itself for better alignment with the pre-trained model. As such, in this paper, we introduce a two-stage adaptation approach termed SPTNet, which iteratively optimizes model parameters (i.e., model-finetuning) and data parameters (i.e., prompt learning). Furthermore, we propose a novel spatial prompt tuning method (SPT) which considers the spatial property of image data, enabling the method to better focus on object parts, which can transfer between seen and unseen classes. We thoroughly evaluate our SPTNet on standard benchmarks and demonstrate that our method outperforms existing GCD methods. Notably, we find our method achieves an average accuracy of 61.4% on the SSB, surpassing prior state-of-the-art methods by approximately 10%. The improvement is particularly remarkable as our method yields extra parameters amounting to only 0.117% of those in the backbone architecture. Project page: https://visual-ai.github.io/sptnet.
翻译:广义类别发现(GCD)旨在通过从一组已标注的“已知”类图像迁移知识,对来自“已知”类和“未知”类的未标注图像进行分类。现有GCD方法的一个核心思路是调整大规模预训练模型以适应GCD任务。然而,另一种视角是调整数据表示本身以更好地与预训练模型对齐。为此,本文提出一种名为SPTNet的两阶段自适应方法,该方法迭代优化模型参数(即模型微调)与数据参数(即提示学习)。此外,我们提出一种新颖的空间提示调优方法(SPT),该方法充分考虑图像数据的空间特性,使模型能更专注于可在已知类与未知类间迁移的目标部件特征。我们在标准基准上全面评估了SPTNet,证明该方法优于现有GCD方法。值得注意的是,在SSB数据集上,我们的方法平均准确率达61.4%,较先前最优方法提升约10%。这一提升尤为显著,因为该方法引入的额外参数仅占主干架构参数量的0.117%。项目页面:https://visual-ai.github.io/sptnet。