Novel class discovery (NCD) aims at learning a model that transfers the common knowledge from a class-disjoint labelled dataset to another unlabelled dataset and discovers new classes (clusters) within it. Many methods, as well as elaborate training pipelines and appropriate objectives, have been proposed and considerably boosted performance on NCD tasks. Despite all this, we find that the existing methods do not sufficiently take advantage of the essence of the NCD setting. To this end, in this paper, we propose to model both inter-class and intra-class constraints in NCD based on the symmetric Kullback-Leibler divergence (sKLD). Specifically, we propose an inter-class sKLD constraint to effectively exploit the disjoint relationship between labelled and unlabelled classes, enforcing the separability for different classes in the embedding space. In addition, we present an intra-class sKLD constraint to explicitly constrain the intra-relationship between a sample and its augmentations and ensure the stability of the training process at the same time. We conduct extensive experiments on the popular CIFAR10, CIFAR100 and ImageNet benchmarks and successfully demonstrate that our method can establish a new state of the art and can achieve significant performance improvements, e.g., 3.5%/3.7% clustering accuracy improvements on CIFAR100-50 dataset split under the task-aware/-agnostic evaluation protocol, over previous state-of-the-art methods. Code is available at https://github.com/FanZhichen/NCD-IIC.
翻译:新型类别发现(NCD)旨在学习一个模型,将来自类别分离标注数据集的通用知识迁移至另一个未标注数据集,并在其中发现新类别(聚类)。已有许多方法,配合精巧的训练流程与合理的目标函数,显著提升了NCD任务的性能。尽管如此,我们发现现有方法未能充分利用NCD设置的本质。为此,本文提出基于对称Kullback-Leibler散度(sKLD)对NCD中的类别间与类别内约束进行建模。具体而言,我们提出一种类别间sKLD约束,有效利用标注类别与未标注类别之间的分离关系,在嵌入空间中强化不同类别的可分离性。同时,提出一种类别内sKLD约束,显式约束样本与其增强样本之间的内部关系,并同时确保训练过程的稳定性。我们在广泛使用的CIFAR10、CIFAR100和ImageNet基准数据集上进行了大量实验,成功证明我们的方法能够达到新的最佳性能,并在任务感知/任务无关评估协议下,相较于先前最优方法在CIFAR100-50数据集划分上实现了显著的性能提升(例如聚类准确率分别提升3.5%/3.7%)。代码已开源:https://github.com/FanZhichen/NCD-IIC。