RGB-thermal semantic segmentation is one potential solution to achieve reliable semantic scene understanding in adverse weather and lighting conditions. However, the previous studies mostly focus on designing a multi-modal fusion module without consideration of the nature of multi-modality inputs. Therefore, the networks easily become over-reliant on a single modality, making it difficult to learn complementary and meaningful representations for each modality. This paper proposes 1) a complementary random masking strategy of RGB-T images and 2) self-distillation loss between clean and masked input modalities. The proposed masking strategy prevents over-reliance on a single modality. It also improves the accuracy and robustness of the neural network by forcing the network to segment and classify objects even when one modality is partially available. Also, the proposed self-distillation loss encourages the network to extract complementary and meaningful representations from a single modality or complementary masked modalities. Based on the proposed method, we achieve state-of-the-art performance over three RGB-T semantic segmentation benchmarks. Our source code is available at https://github.com/UkcheolShin/CRM_RGBTSeg.
翻译:RGB-热红外语义分割是在恶劣天气和光照条件下实现可靠语义场景理解的一种潜在解决方案。然而,以往的研究大多侧重于设计多模态融合模块,而未充分考虑多模态输入的本质特性,导致网络容易过度依赖单一模态,难以学习各模态间互补且有意义的表征。本文提出:1)RGB-T图像的互补随机遮蔽策略;2)干净模态与遮蔽模态输入之间的自蒸馏损失。所提遮蔽策略可防止对单一模态的过度依赖,并通过迫使网络在单模态部分可用时仍能完成物体分割与分类,提升神经网络的准确性与鲁棒性。此外,所提自蒸馏损失鼓励网络从单一模态或互补遮蔽模态中提取互补且有意义的表征。基于所提方法,我们在三个RGB-T语义分割基准上取得了最优性能。源代码已开源至 https://github.com/UkcheolShin/CRM_RGBTSeg。