Hierarchical open-set classification handles previously unseen classes by assigning them to the most appropriate high-level category in a class taxonomy. We extend this paradigm to the semi-supervised setting, enabling the use of large-scale, uncurated datasets containing a mixture of known and unknown classes to improve the hierarchical open-set performance. To this end, we propose a teacher-student framework based on pseudo-labeling. Two key components are introduced: 1) subtree pseudo-labels, which provide reliable supervision in the presence of unknown data, and 2) age-gating, a mechanism that mitigates overconfidence in pseudo-labels. Experiments show that our framework outperforms self-supervised pretraining followed by supervised adaptation, and even matches the fully supervised counterpart when using only 20 labeled samples per class on the iNaturalist19 benchmark. Our code is available at https://github.com/walline/semihoc.
翻译:层次化开放集分类通过将先前未见类别分配至类别分类体系中最合适的高层类别来处理未知类别。我们将该范式扩展至半监督场景,使得能够利用包含已知与未知类别混合的大规模未标注数据集来提升层次化开放集分类性能。为此,我们提出一种基于伪标注的师生框架。该框架引入两个关键组件:1)子树伪标签——在存在未知数据时提供可靠的监督信号;2)年龄门控机制——用于缓解对伪标签的过度置信。实验表明,我们的框架在iNaturalist19基准测试上,仅使用每类20个标注样本时,其性能不仅优于自监督预训练结合监督微调的方法,甚至能达到全监督对照方法的同等水平。代码已发布于https://github.com/walline/semihoc。