For semantic segmentation in urban scene understanding, RGB cameras alone often fail to capture a clear holistic topology, especially in challenging lighting conditions. Thermal signal is an informative additional channel that can bring to light the contour and fine-grained texture of blurred regions in low-quality RGB image. Aiming at RGB-T (thermal) segmentation, existing methods either use simple passive channel/spatial-wise fusion for cross-modal interaction, or rely on heavy labeling of ambiguous boundaries for fine-grained supervision. We propose a Spatial-aware Demand-guided Recursive Meshing (SpiderMesh) framework that: 1) proactively compensates inadequate contextual semantics in optically-impaired regions via a demand-guided target masking algorithm; 2) refines multimodal semantic features with recursive meshing to improve pixel-level semantic analysis performance. We further introduce an asymmetric data augmentation technique M-CutOut, and enable semi-supervised learning to fully utilize RGB-T labels only sparsely available in practical use. Extensive experiments on MFNet and PST900 datasets demonstrate that SpiderMesh achieves new state-of-the-art performance on standard RGB-T segmentation benchmarks.
翻译:针对城市场景理解中的语义分割任务,仅依赖RGB相机在复杂光照条件下往往难以捕获清晰的全局拓扑结构。热红外信号作为信息丰富的补充通道,能够揭示低质量RGB图像中模糊区域的轮廓与精细纹理。现有RGB-T(热红外)分割方法要么采用简单的被动通道/空间融合进行跨模态交互,要么依赖大量标注的模糊边界进行细粒度监督。本文提出一种空间感知需求引导递归网格划分(SpiderMesh)框架,该框架通过以下方式实现:1)采用需求引导的目标掩码算法主动补偿光学受损区域缺失的上下文语义信息;2)利用递归网格划分细化多模态语义特征以提升像素级语义分析性能。我们进一步引入非对称数据增强技术M-CutOut,并实现半监督学习以充分利用实际应用中仅稀疏可用的RGB-T标签。在MFNet和PST900数据集上的大量实验表明,SpiderMesh在标准RGB-T语义分割基准上达到了当前最优性能。