Novel Class Discovery (NCD) aims at inferring novel classes in an unlabeled set by leveraging prior knowledge from a labeled set with known classes. Despite its importance, there is a lack of theoretical foundations for NCD. This paper bridges the gap by providing an analytical framework to formalize and investigate when and how known classes can help discover novel classes. Tailored to the NCD problem, we introduce a graph-theoretic representation that can be learned by a novel NCD Spectral Contrastive Loss (NSCL). Minimizing this objective is equivalent to factorizing the graph's adjacency matrix, which allows us to derive a provable error bound and provide the sufficient and necessary condition for NCD. Empirically, NSCL can match or outperform several strong baselines on common benchmark datasets, which is appealing for practical usage while enjoying theoretical guarantees.
翻译:新类发现(NCD)旨在通过利用带有已知类的标记集中的先验知识,推断未标记集中的新类。尽管其重要性不言而喻,但NCD缺乏理论基础。本文通过提供一个分析框架来弥合这一差距,该框架可用于形式化并研究已知类何时以及如何帮助发现新类。针对NCD问题,我们引入了一种可通过新型NCD谱对比损失(NSCL)学习的图论表示。最小化该目标等价于分解图的邻接矩阵,从而使我们能够推导出可证明的误差界,并为NCD提供充分必要条件。在实证上,NSCL能够在常见基准数据集上匹配或超越多个强基线方法,既具备理论保证,又便于实际应用。