Due to the high inter-class similarity caused by the complex composition within scenes and the co-existing objects across scenes, various studies have explored object semantic knowledge within scenes to improve scene recognition. However, a resulting issue arises as semantic segmentation or object detection techniques demand heavy computational power, thereby burdening the network considerably. This limitation often renders object-assisted approaches incompatible with edge devices. In contrast, this paper proposes a semantic-based similarity prototype that assists the scene recognition network to achieve higher accuracy without increasing network parameters. It is simple and can be plug-and-played into existing pipelines. More specifically, a statistical strategy is introduced to depict semantic knowledge in scenes as class-level semantic representations. These representations are utilized to explore inter-class correlations, ultimately constructing a similarity prototype. Furthermore, we propose two ways to use the similarity prototype to support network training from the perspective of gradient label softening and batch-level contrastive loss, respectively. Comprehensive evaluations on multiple benchmarks show that our similarity prototype enhances the performance of existing networks without adding any computational burden. Code and the statistical similarity prototype will be available soon.
翻译:由于场景内部构成复杂且不同场景间存在共现物体,导致类间相似性较高,已有诸多研究利用场景中的物体语义知识来提升场景识别性能。然而,由此产生的问题是语义分割或目标检测技术需要大量计算资源,给网络带来沉重负担。这一局限性常使得基于物体辅助的方法难以适用于边缘设备。与此不同,本文提出一种基于语义的相似性原型,在不增加网络参数的情况下辅助场景识别网络获得更高精度。该原型简单易用,可即插即用于现有流程。具体而言,我们引入一种统计策略,将场景中的语义知识刻画为类别级语义表征,并利用这些表征探索类间关联,最终构建相似性原型。此外,我们提出两种利用该相似性原型的方式,分别从梯度标签软化和批次级对比损失的角度支持网络训练。在多个基准数据集上的全面评估表明,我们的相似性原型在不增加计算负担的前提下提升了现有网络的性能。代码与统计相似性原型将很快公开发布。