Semantic-guided spatial relation and object co-occurrence modeling for indoor scene recognition

Exploring the semantic context in scene images is essential for indoor scene recognition. However, due to the diverse intra-class spatial layouts and the coexisting inter-class objects, modeling contextual relationships to adapt various image characteristics is a great challenge. Existing contextual modeling methods for indoor scene recognition exhibit two limitations: 1) During training, space-independent information, such as color, may hinder optimizing the network's capacity to represent the spatial context. 2) These methods often overlook the differences in coexisting objects across different scenes, suppressing scene recognition performance. To address these limitations, we propose SpaCoNet, which simultaneously models the Spatial relation and Co-occurrence of objects based on semantic segmentation. Firstly, the semantic spatial relation module (SSRM) is designed to explore the spatial relation among objects within a scene. With the help of semantic segmentation, this module decouples the spatial information from the image, effectively avoiding the influence of irrelevant features. Secondly, both spatial context features from the SSRM and deep features from the Image Feature Extraction Module are used to distinguish the coexisting object across different scenes. Finally, utilizing the discriminative features mentioned above, we employ the self-attention mechanism to explore the long-range co-occurrence among objects, and further generate a semantic-guided feature representation for indoor scene recognition. Experimental results on three widely used scene datasets demonstrate the effectiveness and generality of the proposed method. The code will be made publicly available after the blind review process is completed.

翻译：探索场景图像中的语义上下文对于室内场景识别至关重要。然而，由于类内空间布局的多样性以及类间对象的共存性，建模上下文关系以适应各种图像特征是一个巨大挑战。现有室内场景识别的上下文建模方法存在两个局限性：1）训练过程中，颜色等与空间无关的信息可能阻碍网络表征空间上下文能力的优化。2）这些方法常忽略不同场景中共存对象的差异，从而抑制了场景识别性能。为解决这些问题，我们提出SpaCoNet，该方法基于语义分割同时建模对象的空间关系与共现性。首先，设计语义空间关系模块（SSRM）以探索场景内对象间的空间关系。借助语义分割，该模块从图像中解耦空间信息，有效避免无关特征的影响。其次，利用SSRM提取的空间上下文特征和图像特征提取模块提取的深层特征，区分不同场景中的共存对象。最后，基于上述判别性特征，采用自注意力机制探索对象间的长距离共现性，并生成语义引导的特征表示用于室内场景识别。在三个广泛使用的场景数据集上的实验结果表明了所提方法的有效性和泛化性。代码将在盲审流程完成后公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日