RGB-T semantic segmentation is a key technique for autonomous driving scenes understanding. For the existing RGB-T semantic segmentation methods, however, the effective exploration of the complementary relationship between different modalities is not implemented in the information interaction between multiple levels. To address such an issue, the Context-Aware Interaction Network (CAINet) is proposed for RGB-T semantic segmentation, which constructs interaction space to exploit auxiliary tasks and global context for explicitly guided learning. Specifically, we propose a Context-Aware Complementary Reasoning (CACR) module aimed at establishing the complementary relationship between multimodal features with the long-term context in both spatial and channel dimensions. Further, considering the importance of global contextual and detailed information, we propose the Global Context Modeling (GCM) module and Detail Aggregation (DA) module, and we introduce specific auxiliary supervision to explicitly guide the context interaction and refine the segmentation map. Extensive experiments on two benchmark datasets of MFNet and PST900 demonstrate that the proposed CAINet achieves state-of-the-art performance. The code is available at https://github.com/YingLv1106/CAINet.
翻译:RGB-T语义分割是自动驾驶场景理解的关键技术。然而,现有RGB-T语义分割方法在多层级信息交互中,未能有效探索不同模态间的互补关系。针对这一问题,本文提出面向上下文的交互网络(CAINet)用于RGB-T语义分割,通过构建交互空间以利用辅助任务和全局上下文进行显式引导学习。具体而言,我们提出面向上下文的互补推理(CACR)模块,旨在建立多模态特征在空间和通道维度上具有长期上下文的互补关系。此外,考虑到全局上下文与细节信息的重要性,我们提出全局上下文建模(GCM)模块与细节聚合(DA)模块,并引入特定辅助监督以显式引导上下文交互并优化分割图。在MFNet和PST900两个基准数据集上的大量实验表明,所提出的CAINet达到了最先进性能。代码可从https://github.com/YingLv1106/CAINet获取。