Novel Class Discovery (NCD) involves identifying new categories within unlabeled data by utilizing knowledge acquired from previously established categories. However, existing NCD methods often struggle to maintain a balance between the performance of old and new categories. Discovering unlabeled new categories in a class-incremental way is more practical but also more challenging, as it is frequently hindered by either catastrophic forgetting of old categories or an inability to learn new ones. Furthermore, the implementation of NCD on continuously scalable graph-structured data remains an under-explored area. In response to these challenges, we introduce for the first time a more practical NCD scenario for node classification (i.e., NC-NCD), and propose a novel self-training framework with prototype replay and distillation called SWORD, adopted to our NC-NCD setting. Our approach enables the model to cluster unlabeled new category nodes after learning labeled nodes while preserving performance on old categories without reliance on old category nodes. SWORD achieves this by employing a self-training strategy to learn new categories and preventing the forgetting of old categories through the joint use of feature prototypes and knowledge distillation. Extensive experiments on four common benchmarks demonstrate the superiority of SWORD over other state-of-the-art methods.
翻译:新类别发现(NCD)旨在利用从已知类别获取的知识,在未标注数据中识别新的类别。然而,现有NCD方法往往难以在旧类别与新类别的性能之间保持平衡。以类别增量方式发现未标注的新类别更具实际意义,但也更具挑战性,因为该方法常受限于对旧类别的灾难性遗忘或对新类别的学习能力不足。此外,在持续可扩展的图结构数据上实现NCD仍是一个研究不足的领域。针对这些挑战,我们首次提出了面向节点分类的更实用NCD场景(即NC-NCD),并提出了一种结合原型回放与蒸馏的新型自训练框架SWORD,该框架适配于我们的NC-NCD设定。我们的方法使模型在学习已标注节点后能够聚类未标注的新类别节点,同时在不依赖旧类别节点的前提下保持对旧类别的分类性能。SWORD通过采用自训练策略学习新类别,并联合运用特征原型与知识蒸馏来防止旧类别的遗忘。在四个常用基准数据集上的大量实验表明,SWORD优于其他前沿方法。