DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-shot image classification are attributes which are annotations for class-level visual characteristics. However, the current methods often fail to discriminate those subtle visual distinctions between images due to not only the shortage of fine-grained annotations, but also the attribute imbalance and co-occurrence. In this paper, we present a transformer-based end-to-end ZSL method named DUET, which integrates latent semantic knowledge from the pre-trained language models (PLMs) via a self-supervised multi-modal learning paradigm. Specifically, we (1) developed a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images; (2) applied an attribute-level contrastive learning strategy to further enhance the model's discrimination on fine-grained visual characteristics against the attribute co-occurrence and imbalance; (3) proposed a multi-task learning policy for considering multi-model objectives. We find that our DUET can achieve state-of-the-art performance on three standard ZSL benchmarks and a knowledge graph equipped ZSL benchmark. Its components are effective and its predictions are interpretable.

翻译：零样本学习（ZSL）旨在预测训练过程中从未出现样本的未见类别。属性作为类别级视觉特征的标注，是零样本图像分类中最有效且广泛使用的语义信息之一。然而，现有方法常因细粒度标注不足、属性不平衡及共现问题，难以区分图像间细微的视觉差异。本文提出一种基于Transformer的端到端ZSL方法DUET，通过自监督多模态学习范式，从预训练语言模型（PLMs）中整合潜在语义知识。具体而言，我们：（1）构建跨模态语义锚定网络，探究模型从图像中解耦语义属性的能力；（2）应用属性级对比学习策略，进一步增强模型对细粒度视觉特征的判别能力，以应对属性共现与不平衡问题；（3）提出多任务学习策略以整合多模型目标。实验表明，DUET在三个标准ZSL基准测试及一个知识图谱增强的ZSL基准测试上均实现了最优性能，其各组件有效且预测结果具有可解释性。

相关内容

Duet

关注 0

Duet Game 开发商Kumobius Pty Ltd，更新时间2014年5月2日。
Duet Game的节奏并不复杂，通过不断的重新排列组合，来重新定义关卡的难度。

游戏界面不定时飘来方块，根据音乐的节奏来变换着队形。而玩家需要做的便是，在适当的时机，通过触摸屏幕来巧妙而灵活的躲避下坠的方块。点触屏幕两侧，使方块旋转或扭曲，避开前进道路上的障碍物。即使开头很简单，最后可能很复杂。

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日