Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes. A recent trend in OSR shows the benefit of generative models to discriminative unknown detection. As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks. However, most existing EBMs suffer from density estimation in high-dimensional space, which is critical to recognizing images from fine-grained classes. In this paper, we explore the low-dimensional latent space with energy-based prior distribution for OSR in a fine-grained visual world. Specifically, based on the latent space EBM, we propose an attribute-aware information bottleneck (AIB), a residual attribute feature aggregation (RAFA) module, and an uncertainty-based virtual outlier synthesis (UVOS) module to improve the expressivity, granularity, and density of the samples in fine-grained classes, respectively. Our method is flexible to take advantage of recent vision transformers for powerful visual classification and generation. The method is validated on both fine-grained and general visual classification datasets while preserving the capability of generating photo-realistic fake images with high resolution.
翻译:细粒度开放集识别(FineOSR)旨在识别具有细微外观差异类别的图像,同时拒绝未知类别的图像。开放集识别(OSR)的最新趋势表明,生成模型对判别式未知检测具有优势。作为生成模型的一种,能量模型(EBM)在生成式与判别式任务的混合建模中具有潜力。然而,现有大多数EBM在高维空间中的密度估计存在困难,而这对于识别细粒度类别的图像至关重要。本文探索了具有能量先验分布的低维隐空间,以实现细粒度视觉世界中的OSR。具体而言,基于隐空间EBM,我们提出了一种属性感知信息瓶颈(AIB)、残差属性特征聚合(RAFA)模块和基于不确定性的虚拟异常点合成(UVOS)模块,分别提升细粒度类别中样本的表达性、粒度和密度。我们的方法可灵活利用现代视觉Transformer实现强大的视觉分类与生成能力。该方法在细粒度与通用视觉分类数据集上均得到验证,同时保持生成高分辨率逼真伪图像的能力。