For semantic segmentation in urban scene understanding, RGB cameras alone often fail to capture a clear holistic topology in challenging lighting conditions. Thermal signal is an informative additional channel that can bring to light the contour and fine-grained texture of blurred regions in low-quality RGB image. Aiming at practical RGB-T (thermal) segmentation, we systematically propose a Spatial-aware Demand-guided Recursive Meshing (SpiderMesh) framework that: 1) proactively compensates inadequate contextual semantics in optically-impaired regions via a demand-guided target masking algorithm; 2) refines multimodal semantic features with recursive meshing to improve pixel-level semantic analysis performance. We further introduce an asymmetric data augmentation technique M-CutOut, and enable semi-supervised learning to fully utilize RGB-T labels only sparsely available in practical use. Extensive experiments on MFNet and PST900 datasets demonstrate that SpiderMesh achieves state-of-the-art performance on standard RGB-T segmentation benchmarks.
翻译:针对城市场景理解中的语义分割任务,传统仅依赖RGB摄像头在光照条件不佳时往往难以捕捉清晰的整体拓扑结构。热红外信号作为信息丰富的补充通道,能够揭示低质量RGB图像中模糊区域的轮廓与细粒度纹理。面向实用的RGB-T(热红外)分割,我们系统性地提出了一种空间感知需求引导递归网格化框架(SpiderMesh),该框架:1)通过需求引导目标掩码算法主动补偿光学受损区域中缺失的上下文语义;2)利用递归网格化优化多模态语义特征以提升像素级语义分析性能。此外,我们引入非对称数据增强技术M-CutOut,并支持半监督学习以充分利用实际应用中稀疏标注的RGB-T标签。在MFNet与PST900数据集上的大量实验表明,SpiderMesh在标准RGB-T分割基准测试中达到了最先进性能水平。