In this paper, we study a practical yet challenging task, On-the-fly Category Discovery (OCD), aiming to online discover the newly-coming stream data that belong to both known and unknown classes, by leveraging only known category knowledge contained in labeled data. Previous OCD methods employ the hash-based technique to represent old/new categories by hash codes for instance-wise inference. However, directly mapping features into low-dimensional hash space not only inevitably damages the ability to distinguish classes and but also causes "high sensitivity" issue, especially for fine-grained classes, leading to inferior performance. To address these issues, we propose a novel Prototypical Hash Encoding (PHE) framework consisting of Category-aware Prototype Generation (CPG) and Discriminative Category Encoding (DCE) to mitigate the sensitivity of hash code while preserving rich discriminative information contained in high-dimension feature space, in a two-stage projection fashion. CPG enables the model to fully capture the intra-category diversity by representing each category with multiple prototypes. DCE boosts the discrimination ability of hash code with the guidance of the generated category prototypes and the constraint of minimum separation distance. By jointly optimizing CPG and DCE, we demonstrate that these two components are mutually beneficial towards an effective OCD. Extensive experiments show the significant superiority of our PHE over previous methods, e.g., obtaining an improvement of +5.3% in ALL ACC averaged on all datasets. Moreover, due to the nature of the interpretable prototypes, we visually analyze the underlying mechanism of how PHE helps group certain samples into either known or unknown categories. Code is available at https://github.com/HaiyangZheng/PHE.
翻译:本文研究一个实用且具有挑战性的任务——即时类别发现(OCD),其目标是通过仅利用标注数据中包含的已知类别知识,在线发现属于已知和未知类别的新到达流数据。先前OCD方法采用基于哈希的技术,通过哈希码表示新旧类别以进行实例级推理。然而,直接将特征映射到低维哈希空间不仅不可避免地损害类别区分能力,还会引发“高敏感性”问题,特别是对于细粒度类别,导致性能不佳。为解决这些问题,我们提出一种新颖的原型哈希编码(PHE)框架,该框架包含类别感知原型生成(CPG)和判别性类别编码(DCE)两个模块,通过两阶段投影方式在保持高维特征空间丰富判别信息的同时降低哈希码的敏感性。CPG通过用多个原型表示每个类别,使模型能够充分捕捉类内多样性。DCE在生成类别原型的引导和最小分离距离约束下,增强哈希码的判别能力。通过联合优化CPG和DCE,我们证明这两个组件能够相互促进以实现有效的OCD。大量实验表明,我们的PHE方法相较于先前方法具有显著优势,例如在所有数据集上平均获得+5.3%的ALL ACC提升。此外,得益于可解释原型的特性,我们通过可视化分析揭示了PHE如何将特定样本分组到已知或未知类别的内在机制。代码发布于https://github.com/HaiyangZheng/PHE。