Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of 9.7% within the Stanford Cars dataset and 12x clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at https://github.com/xjtuYW/PNP.git.

翻译：广义类别发现旨在基于从有标签数据中学到的知识，对无标签数据动态分配标签，其中无标签数据可能来自已知或新类别。当前主流方法通常涉及对所有数据进行聚类，并通过原型对比学习来学习概念。然而，现有方法高度依赖于聚类算法的性能，因而受其固有局限的约束。首先，估计的聚类数通常小于真实值，导致现有方法缺乏用于全面概念学习的原型。为解决此问题，我们提出一种自适应探测机制，引入可学习的潜在原型以扩展聚类原型（中心）。由于潜在原型无真实标签，我们开发了一个自监督原型学习框架，以端到端方式优化潜在原型。其次，聚类计算密集，而对有标签和无标签实例进行聚类的传统策略进一步加剧了这一问题。为提升效率，我们仅对无标签实例进行聚类，随后通过引入的潜在原型扩展聚类原型，以快速探索新类别。尽管所提方法简单，但在广泛数据集上的实证分析证实，该方法持续取得最优结果。具体而言，在Stanford Cars数据集上，我们的方法较最接近的竞争者显著提升9.7%，在Herbarium 19数据集上聚类效率提升12倍。我们将代码和检查点公开于 https://github.com/xjtuYW/PNP.git。