Zero-shot learning has consistently yielded remarkable progress via modeling nuanced one-to-one visual-attribute correlation. Existing studies resort to refining a uniform mapping function to align and correlate the sample regions and subattributes, ignoring two crucial issues: 1) the inherent asymmetry of attributes; and 2) the unutilized channel information. This paper addresses these issues by introducing a simple yet effective approach, dubbed Dual Expert Distillation Network (DEDN), where two experts are dedicated to coarse- and fine-grained visual-attribute modeling, respectively. Concretely, one coarse expert, namely cExp, has a complete perceptual scope to coordinate visual-attribute similarity metrics across dimensions, and moreover, another fine expert, namely fExp, consists of multiple specialized subnetworks, each corresponds to an exclusive set of attributes. Two experts cooperatively distill from each other to reach a mutual agreement during training. Meanwhile, we further equip DEDN with a newly designed backbone network, i.e., Dual Attention Network (DAN), which incorporates both region and channel attention information to fully exploit and leverage visual semantic knowledge. Experiments on various benchmark datasets indicate a new state-of-the-art.
翻译:零样本学习通过建模细微的一对一视觉-属性关联持续取得显著进展。现有研究通过优化统一映射函数来对齐并关联样本区域与子属性,却忽略了两个关键问题:1)属性固有的非对称性;2)未被利用的通道信息。本文通过引入一种简单而有效的方法——双专家蒸馏网络(DEDN)——来解决这些问题,其中两位专家分别致力于粗粒度和细粒度的视觉-属性建模。具体而言,粗粒度专家(即cExp)具有完整的感知范围,可跨维度协调视觉-属性相似度度量;同时,另一位细粒度专家(即fExp)由多个专用子网络构成,每个子网络对应一组独立的属性。两位专家在训练过程中通过相互蒸馏达成共识。此外,我们为DEDN配备了新设计的骨干网络——双注意力网络(DAN),该网络融合了区域注意力与通道注意力信息,以充分挖掘和利用视觉语义知识。在多个基准数据集上的实验表明,该方法达到了新的最优水平。