Generalized Zero-Shot Learning (GZSL) has emerged as a pivotal research domain in computer vision, owing to its capability to recognize objects that have not been seen during training. Despite the significant progress achieved by generative techniques in converting traditional GZSL to fully supervised learning, they tend to generate a large number of synthetic features that are often redundant, thereby increasing training time and decreasing accuracy. To address this issue, this paper proposes a novel approach for synthetic feature selection using reinforcement learning. In particular, we propose a transformer-based selector that is trained through proximal policy optimization (PPO) to select synthetic features based on the validation classification accuracy of the seen classes, which serves as a reward. The proposed method is model-agnostic and data-agnostic, making it applicable to both images and videos and versatile for diverse applications. Our experimental results demonstrate the superiority of our approach over existing feature-generating methods, yielding improved overall performance on multiple benchmarks.
翻译:广义零样本学习(GZSL)因其能够识别训练中未见的物体,已成为计算机视觉领域的关键研究方向。尽管生成技术在将传统GZSL转化为全监督学习方面取得了显著进展,但这类方法往往会产生大量冗余的合成特征,从而增加训练时间并降低精度。为解决这一问题,本文提出了一种基于强化学习的合成特征选择新方法。具体而言,我们设计了一个基于Transformer的筛选器,通过近端策略优化(PPO)进行训练,以可见类别的验证分类精度作为奖励信号,从而选择合成特征。该方法具有模型无关和数据无关的特性,既适用于图像也适用于视频,可灵活应用于多种场景。实验结果表明,与现有特征生成方法相比,我们的方法在多个基准测试中展现出更优的整体性能。