Due to the high inter-class similarity caused by the complex composition and the co-existing objects across scenes, numerous studies have explored object semantic knowledge within scenes to improve scene recognition. However, a resulting challenge emerges as object information extraction techniques require heavy computational costs, thereby burdening the network considerably. This limitation often renders object-assisted approaches incompatible with edge devices in practical deployment. In contrast, this paper proposes a semantic knowledge-based similarity prototype, which can help the scene recognition network achieve superior accuracy without increasing the computational cost in practice. It is simple and can be plug-and-played into existing pipelines. More specifically, a statistical strategy is introduced to depict semantic knowledge in scenes as class-level semantic representations. These representations are used to explore correlations between scene classes, ultimately constructing a similarity prototype. Furthermore, we propose to leverage the similarity prototype to support network training from the perspective of Gradient Label Softening and Batch-level Contrastive Loss, respectively. Comprehensive evaluations on multiple benchmarks show that our similarity prototype enhances the performance of existing networks, all while avoiding any additional computational burden in practical deployments. Code and the statistical similarity prototype will be available soon.
翻译:由于场景中复杂的构成和共存对象导致类间高度相似性,许多研究探索场景内的对象语义知识以改进场景识别。然而,由此产生的挑战是对象信息提取技术需要大量计算成本,从而显著增加网络负担。这一限制往往使得对象辅助方法在实际部署中无法兼容边缘设备。相比之下,本文提出了一种基于语义知识的相似性原型,它能够在不增加实际计算成本的情况下帮助场景识别网络实现更高精度。该方法简单且可即插即用于现有框架。具体而言,我们引入了一种统计策略,将场景中的语义知识描述为类别级语义表示,并利用这些表示探索场景类别之间的相关性,最终构建相似性原型。此外,我们提出分别从梯度标签软化和批次级对比损失的角度,利用相似性原型支持网络训练。在多个基准数据集上的全面评估表明,我们的相似性原型提升了现有网络的性能,同时在实际部署中避免任何额外计算负担。代码和统计相似性原型将很快公开。