While the novel class discovery has recently made great progress, existing methods typically focus on improving algorithms on class-balanced benchmarks. However, in real-world recognition tasks, the class distributions of their corresponding datasets are often imbalanced, which leads to serious performance degeneration of those methods. In this paper, we consider a more realistic setting for novel class discovery where the distributions of novel and known classes are long-tailed. One main challenge of this new problem is to discover imbalanced novel classes with the help of long-tailed known classes. To tackle this problem, we propose an adaptive self-labeling strategy based on an equiangular prototype representation of classes. Our method infers high-quality pseudo-labels for the novel classes by solving a relaxed optimal transport problem and effectively mitigates the class biases in learning the known and novel classes. We perform extensive experiments on CIFAR100, ImageNet100, Herbarium19 and large-scale iNaturalist18 datasets, and the results demonstrate the superiority of our method. Our code is available at https://github.com/kleinzcy/NCDLR.
翻译:尽管新类发现近期取得了显著进展,但现有方法通常聚焦于在类别平衡的基准数据集上改进算法。然而,在真实世界识别任务中,对应数据集的类别分布往往呈现不平衡性,这导致这些方法出现严重的性能退化。本文考虑了更符合实际的新类发现场景,其中新类与已知类的分布均遵循长尾特征。该新问题的主要挑战在于如何借助长尾已知类发现不平衡的新类。为此,我们提出一种基于等角原型类别表征的自适应自标注策略。该方法通过求解松弛最优传输问题为新类推断高质量伪标签,并有效缓解了已知类与新类学习中的类别偏差。我们在CIFAR100、ImageNet100、Herbarium19及大规模iNaturalist18数据集上进行了广泛实验,结果证明本方法的优越性。代码现已开源于https://github.com/kleinzcy/NCDLR。