Semantic-guided modeling of spatial relation and object co-occurrence for indoor scene recognition

Exploring the semantic context in scene images is essential for indoor scene recognition. However, due to the diverse intra-class spatial layouts and the coexisting inter-class objects, modeling contextual relationships to adapt various image characteristics is a great challenge. Existing contextual modeling methods for scene recognition exhibit two limitations: 1) They typically model only one kind of spatial relationship among objects within scenes in an artificially predefined manner, with limited exploration of diverse spatial layouts. 2) They often overlook the differences in coexisting objects across different scenes, suppressing scene recognition performance. To overcome these limitations, we propose SpaCoNet, which simultaneously models Spatial relation and Co-occurrence of objects guided by semantic segmentation. Firstly, the Semantic Spatial Relation Module (SSRM) is constructed to model scene spatial features. With the help of semantic segmentation, this module decouples the spatial information from the scene image and thoroughly explores all spatial relationships among objects in an end-to-end manner. Secondly, both spatial features from the SSRM and deep features from the Image Feature Extraction Module are allocated to each object, so as to distinguish the coexisting object across different scenes. Finally, utilizing the discriminative features above, we design a Global-Local Dependency Module to explore the long-range co-occurrence among objects, and further generate a semantic-guided feature representation for indoor scene recognition. Experimental results on three widely used scene datasets demonstrate the effectiveness and generality of the proposed method.

翻译：探索场景图像中的语义上下文对于室内场景识别至关重要。然而，由于类内空间布局的多样性以及类间目标共存现象，建模适应各种图像特征的上下文关系是一项重大挑战。现有场景识别的上下文建模方法存在两个局限性：1）它们通常仅以人工预定义的方式建模场景内目标之间的一种空间关系，对多样化空间布局的探索有限；2）它们往往忽略不同场景中共存目标的差异，从而抑制了场景识别性能。为克服这些局限，我们提出SpaCoNet，该方法同时建模由语义分割引导的目标空间关系与共现。首先，构建语义空间关系模块（SSRM）以建模场景空间特征。在语义分割辅助下，该模块从场景图像中解耦空间信息，并以端到端方式全面探索目标间的所有空间关系。其次，将SSRM的空间特征与图像特征提取模块的深度特征分别分配给每个目标，以区分不同场景中的共存目标。最后，利用上述判别性特征，我们设计全局-局部依赖模块来探索目标间的长程共现关系，并进一步生成用于室内场景识别的语义引导特征表示。在三个广泛使用的场景数据集上的实验结果证明了所提方法的有效性与通用性。

相关内容

MoDELS

关注 46

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日