Semantic-guided context modeling for indoor scene recognition

Exploring the semantic context in scene images is essential for indoor scene recognition. However, due to the diverse intra-class spatial layouts and the coexisting inter-class objects, modeling contextual relationships to adapt various image characteristics is a great challenge. Existing contextual modeling methods for indoor scene recognition exhibit two limitations: 1) During training, space-independent information, such as color, may hinder optimizing the network's capacity to represent the spatial context. 2) These methods often overlook the differences in coexisting objects across different scenes, suppressing the performance of scene recognition. To address these limitations, we propose SpaCoNet, a novel approach that simultaneously models the Spatial relation and Co-occurrence of objects based on semantic segmentation. Firstly, the semantic spatial relation module (SSRM) is designed to explore the spatial relations among objects within a scene. With the help of semantic segmentation, this module decouples the spatial information from the image, effectively avoiding the influence of irrelevant features. Secondly, both spatial context features from SSRM and deep features from RGB feature extractor are used to distinguish the coexisting object across different scenes. Finally, utilizing the discriminative features mentioned above, we employ the self-attention mechanism to explore the long-range co-occurrence relationships among objects, and further generate a semantic-guided feature representation for indoor scene recognition. Experimental results on three publicly available datasets demonstrate the effectiveness and generality of the proposed method. The code will be made publicly available after the blind-review process is completed.

翻译：探索场景图像中的语义上下文对于室内场景识别至关重要。然而，由于室内场景中类内空间布局的多样性和类间物体的共存性，建模能适应各类图像特征的上下文关系是一项巨大挑战。现有针对室内场景识别的上下文建模方法存在两个局限：1）在训练过程中，颜色等与空间无关的信息可能阻碍网络对空间上下文表征能力的优化。2）这些方法常忽视不同场景中共存物体的差异性，从而抑制了场景识别的性能。为解决上述问题，我们提出SpaCoNet——一种基于语义分割同时建模物体空间关系与共现性的新方法。首先，设计语义空间关系模块（SSRM）来探索场景内物体间的空间关系。借助语义分割，该模块将空间信息从图像中解耦，有效避免了无关特征的影响。其次，利用SSRM提取的空间上下文特征和RGB特征提取器提取的深度特征来区分不同场景中的共存物体。最后，基于上述判别性特征，采用自注意力机制探索物体间的长距离共现关系，并生成用于室内场景识别的语义引导特征表示。在三个公开数据集上的实验结果表明了所提方法的有效性和通用性。代码将在盲审流程结束后公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/