Generative Zero-shot learning (ZSL) learns a generator to synthesize visual samples for unseen classes, which is an effective way to advance ZSL. However, existing generative methods rely on the conditions of Gaussian noise and the predefined semantic prototype, which limit the generator only optimized on specific seen classes rather than characterizing each visual instance, resulting in poor generalizations (\textit{e.g.}, overfitting to seen classes). To address this issue, we propose a novel Visual-Augmented Dynamic Semantic prototype method (termed VADS) to boost the generator to learn accurate semantic-visual mapping by fully exploiting the visual-augmented knowledge into semantic conditions. In detail, VADS consists of two modules: (1) Visual-aware Domain Knowledge Learning module (VDKL) learns the local bias and global prior of the visual features (referred to as domain visual knowledge), which replace pure Gaussian noise to provide richer prior noise information; (2) Vision-Oriented Semantic Updation module (VOSU) updates the semantic prototype according to the visual representations of the samples. Ultimately, we concatenate their output as a dynamic semantic prototype, which serves as the condition of the generator. Extensive experiments demonstrate that our VADS achieves superior CZSL and GZSL performances on three prominent datasets and outperforms other state-of-the-art methods with averaging increases by 6.4\%, 5.9\% and 4.2\% on SUN, CUB and AWA2, respectively.
翻译:生成式零样本学习通过训练生成器合成未见类别的视觉样本,是推进零样本学习的有效途径。然而,现有生成方法依赖于高斯噪声条件和预定义语义原型,导致生成器仅针对特定可见类进行优化,而无法表征每个视觉实例,从而造成泛化能力不足(如对可见类的过拟合)。为解决这一问题,我们提出一种新颖的视觉增强动态语义原型方法(简称VADS),通过将视觉增强知识充分融入语义条件,推动生成器学习准确的语义-视觉映射。具体而言,VADS包含两个模块:(1)视觉感知域知识学习模块(VDKL)学习视觉特征的局部偏差和全局先验(称为域视觉知识),替代纯高斯噪声以提供更丰富的先验噪声信息;(2)面向视觉的语义更新模块(VOSU)根据样本的视觉表征动态更新语义原型。最终,我们将两模块输出拼接为动态语义原型,作为生成器的条件。大量实验表明,我们的VADS在三个主流数据集上均取得了优越的CZSL和GZSL性能,并在SUN、CUB和AWA2数据集上分别以平均提升6.4%、5.9%和4.2%超越其他先进方法。