Generalized Zero-Shot Learning (GZSL) recognizes unseen classes by transferring knowledge from the seen classes, depending on the inherent interactions between visual and semantic data. However, the discrepancy between well-prepared training data and unpredictable real-world test scenarios remains a significant challenge. This paper introduces a dual strategy to address the generalization gap. Firstly, we incorporate semantic information through an innovative encoder. This encoder effectively integrates class-specific semantic information by targeting the performance disparity, enhancing the produced features to enrich the semantic space for class-specific attributes. Secondly, we refine our generative capabilities using a novel compositional loss function. This approach generates discriminative classes, effectively classifying both seen and unseen classes. In addition, we extend the exploitation of the learned latent space by utilizing controlled semantic inputs, ensuring the robustness of the model in varying environments. This approach yields a model that outperforms the state-of-the-art models in terms of both generalization and diverse settings, notably without requiring hyperparameter tuning or domain-specific adaptations. We also propose a set of novel evaluation metrics to provide a more detailed assessment of the reliability and reproducibility of the results. The complete code is made available on https://github.com/william-heyden/SEER-ZeroShotLearning/.
翻译:广义零样本学习(GZSL)通过从已见类别迁移知识来识别未见类别,其核心依赖于视觉数据与语义数据之间的内在交互。然而,精心准备的训练数据与不可预测的真实测试场景之间的差异仍是重大挑战。本文提出一种双重策略以解决泛化鸿沟问题。首先,我们通过创新性的编码器融入语义信息。该编码器通过瞄准性能差异,有效整合类别特定的语义信息,增强生成特征以丰富面向类别属性的语义空间。其次,我们利用新型组合损失函数优化生成能力。该方法可生成判别性类别,有效分类已见与未见类别。此外,我们通过利用受控语义输入扩展对已学习潜在空间的开发,确保模型在不同环境中的鲁棒性。该模型无需超参数调优或领域特定适配,即在泛化性能与多样化场景中均超越现有最优模型。我们还提出一组新型评估指标,以更细致地评估结果的可靠性与可复现性。完整代码已发布于https://github.com/william-heyden/SEER-ZeroShotLearning/。