Hierarchical classification (HC) assigns each object with multiple labels organized into a hierarchical structure. The existing deep learning based HC methods usually predict an instance starting from the root node until a leaf node is reached. However, in the real world, images interfered by noise, occlusion, blur, or low resolution may not provide sufficient information for the classification at subordinate levels. To address this issue, we propose a novel semantic guided level-category hybrid prediction network (SGLCHPN) that can jointly perform the level and category prediction in an end-to-end manner. SGLCHPN comprises two modules: a visual transformer that extracts feature vectors from the input images, and a semantic guided cross-attention module that uses categories word embeddings as queries to guide learning category-specific representations. In order to evaluate the proposed method, we construct two new datasets in which images are at a broad range of quality and thus are labeled to different levels (depths) in the hierarchy according to their individual quality. Experimental results demonstrate the effectiveness of our proposed HC method.
翻译:层次化分类(HC)为每个对象分配按层级结构组织的多个标签。现有基于深度学习的HC方法通常从根节点开始预测直至叶节点。然而,在现实世界中,受噪声、遮挡、模糊或低分辨率干扰的图像可能无法为子级分类提供足够信息。为解决此问题,我们提出了一种新颖的语义引导的层级-类别混合预测网络(SGLCHPN),该网络能以端到端方式联合执行层级和类别预测。SGLCHPN由两个模块组成:从输入图像提取特征向量的视觉Transformer,以及利用类别词嵌入作为查询来引导学习类别特定表示的语义引导交叉注意力模块。为评估所提方法,我们构建了两个新数据集,其中图像质量分布广泛,并根据各自质量被标注到层级结构中的不同层级(深度)。实验结果证明了我们提出的HC方法的有效性。