In this paper, we address the problem of generalized category discovery (GCD), \ie, given a set of images where part of them are labelled and the rest are not, the task is to automatically cluster the images in the unlabelled data, leveraging the information from the labelled data, while the unlabelled data contain images from the labelled classes and also new ones. GCD is similar to semi-supervised learning (SSL) but is more realistic and challenging, as SSL assumes all the unlabelled images are from the same classes as the labelled ones. We also do not assume the class number in the unlabelled data is known a-priori, making the GCD problem even harder. To tackle the problem of GCD without knowing the class number, we propose an EM-like framework that alternates between representation learning and class number estimation. We propose a semi-supervised variant of the Gaussian Mixture Model (GMM) with a stochastic splitting and merging mechanism to dynamically determine the prototypes by examining the cluster compactness and separability. With these prototypes, we leverage prototypical contrastive learning for representation learning on the partially labelled data subject to the constraints imposed by the labelled data. Our framework alternates between these two steps until convergence. The cluster assignment for an unlabelled instance can then be retrieved by identifying its nearest prototype. We comprehensively evaluate our framework on both generic image classification datasets and challenging fine-grained object recognition datasets, achieving state-of-the-art performance.
翻译:本文研究广义类别发现(GCD)问题,即给定一组部分已标注、其余未标注的图像,任务是利用已标注数据的信息自动对未标注数据中的图像进行聚类,同时未标注数据包含来自已标注类别的图像以及全新类别的图像。GCD与半监督学习(SSL)类似,但更为现实且更具挑战性,因为SSL假设所有未标注图像均与已标注图像属于相同类别。此外,我们也不预先假设未标注数据中的类别数量已知,这使得GCD问题更加困难。为解决类别数量未知情况下的GCD问题,我们提出了一种类似EM的框架,该框架交替进行表示学习与类别数量估计。我们提出了一种带有随机分裂与合并机制的半监督高斯混合模型(GMM)变体,通过检查簇的紧致性与可分离性来动态确定原型。利用这些原型,我们在部分标注数据上采用原型对比学习进行表示学习,并受已标注数据约束。我们的框架在两个步骤之间交替迭代直至收敛。随后,未标注实例的簇分配可通过识别其最近原型获得。我们在通用图像分类数据集和具有挑战性的细粒度物体识别数据集上全面评估了该框架,取得了最先进的性能。