Latent Space Energy-based Model for Fine-grained Open Set Recognition

Fine-grained open-set recognition (FineOSR) aims to recognize images belonging to classes with subtle appearance differences while rejecting images of unknown classes. A recent trend in OSR shows the benefit of generative models to discriminative unknown detection. As a type of generative model, energy-based models (EBM) are the potential for hybrid modeling of generative and discriminative tasks. However, most existing EBMs suffer from density estimation in high-dimensional space, which is critical to recognizing images from fine-grained classes. In this paper, we explore the low-dimensional latent space with energy-based prior distribution for OSR in a fine-grained visual world. Specifically, based on the latent space EBM, we propose an attribute-aware information bottleneck (AIB), a residual attribute feature aggregation (RAFA) module, and an uncertainty-based virtual outlier synthesis (UVOS) module to improve the expressivity, granularity, and density of the samples in fine-grained classes, respectively. Our method is flexible to take advantage of recent vision transformers for powerful visual classification and generation. The method is validated on both fine-grained and general visual classification datasets while preserving the capability of generating photo-realistic fake images with high resolution.

翻译：细粒度开放集识别（FineOSR）旨在识别具有细微外观差异类别的图像，同时拒绝未知类别的图像。开放集识别（OSR）的最新趋势表明，生成模型对判别式未知检测具有优势。作为生成模型的一种，能量模型（EBM）在生成式与判别式任务的混合建模中具有潜力。然而，现有大多数EBM在高维空间中的密度估计存在困难，而这对于识别细粒度类别的图像至关重要。本文探索了具有能量先验分布的低维隐空间，以实现细粒度视觉世界中的OSR。具体而言，基于隐空间EBM，我们提出了一种属性感知信息瓶颈（AIB）、残差属性特征聚合（RAFA）模块和基于不确定性的虚拟异常点合成（UVOS）模块，分别提升细粒度类别中样本的表达性、粒度和密度。我们的方法可灵活利用现代视觉Transformer实现强大的视觉分类与生成能力。该方法在细粒度与通用视觉分类数据集上均得到验证，同时保持生成高分辨率逼真伪图像的能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日