Visual-Semantic Graph Matching Net for Zero-Shot Learning

Zero-shot learning (ZSL) aims to leverage additional semantic information to recognize unseen classes. To transfer knowledge from seen to unseen classes, most ZSL methods often learn a shared embedding space by simply aligning visual embeddings with semantic prototypes. However, methods trained under this paradigm often struggle to learn robust embedding space because they align the two modalities in an isolated manner among classes, which ignore the crucial class relationship during the alignment process. To address the aforementioned challenges, this paper proposes a Visual-Semantic Graph Matching Net, termed as VSGMN, which leverages semantic relationships among classes to aid in visual-semantic embedding. VSGMN employs a Graph Build Network (GBN) and a Graph Matching Network (GMN) to achieve two-stage visual-semantic alignment. Specifically, GBN first utilizes an embedding-based approach to build visual and semantic graphs in the semantic space and align the embedding with its prototype for first-stage alignment. Additionally, to supplement unseen class relations in these graphs, GBN also build the unseen class nodes based on semantic relationships. In the second stage, GMN continuously integrates neighbor and cross-graph information into the constructed graph nodes, and aligns the node relationships between the two graphs under the class relationship constraint. Extensive experiments on three benchmark datasets demonstrate that VSGMN achieves superior performance in both conventional and generalized ZSL scenarios. The implementation of our VSGMN and experimental results are available at github: https://github.com/dbwfd/VSGMN

翻译：零样本学习（ZSL）旨在利用额外的语义信息来识别未见过的类别。为了将知识从已见类别迁移到未见类别，大多数ZSL方法通常通过简单地将视觉嵌入与语义原型对齐来学习共享的嵌入空间。然而，在这种范式下训练的方法往往难以学习到鲁棒的嵌入空间，因为它们在类别之间以孤立的方式对齐两种模态，从而在对齐过程中忽略了关键的类别关系。为了解决上述挑战，本文提出了一种视觉语义图匹配网络（简称VSGMN），该网络利用类别间的语义关系来辅助视觉语义嵌入。VSGMN采用图构建网络（GBN）和图匹配网络（GMN）来实现两阶段的视觉语义对齐。具体而言，GBN首先利用基于嵌入的方法在语义空间中构建视觉图和语义图，并将嵌入与其原型对齐以实现第一阶段的对齐。此外，为了补充这些图中未见类别的关系，GBN还基于语义关系构建了未见类别的节点。在第二阶段，GMN持续将邻居信息和跨图信息整合到已构建的图节点中，并在类别关系约束下对齐两个图之间的节点关系。在三个基准数据集上的大量实验表明，VSGMN在传统和广义ZSL场景中均实现了优越的性能。我们的VSGMN实现和实验结果可在github上获取：https://github.com/dbwfd/VSGMN