The ability to scene understanding in adverse visual conditions, e.g., nighttime, has sparked active research for RGB-Thermal (RGB-T) semantic segmentation. However, it is essentially hampered by two critical problems: 1) the day-night gap of RGB images is larger than that of thermal images, and 2) the class-wise performance of RGB images at night is not consistently higher or lower than that of thermal images. we propose the first test-time adaptation (TTA) framework, dubbed Night-TTA, to address the problems for nighttime RGBT semantic segmentation without access to the source (daytime) data during adaptation. Our method enjoys three key technical parts. Firstly, as one modality (e.g., RGB) suffers from a larger domain gap than that of the other (e.g., thermal), Imaging Heterogeneity Refinement (IHR) employs an interaction branch on the basis of RGB and thermal branches to prevent cross-modal discrepancy and performance degradation. Then, Class Aware Refinement (CAR) is introduced to obtain reliable ensemble logits based on pixel-level distribution aggregation of the three branches. In addition, we also design a specific learning scheme for our TTA framework, which enables the ensemble logits and three student logits to collaboratively learn to improve the quality of predictions during the testing phase of our Night TTA. Extensive experiments show that our method achieves state-of-the-art (SoTA) performance with a 13.07% boost in mIoU.
翻译:在不良视觉条件(如夜间)下进行场景理解的能力,推动了红热(RGB-T)语义分割的活跃研究。然而,该研究始终面临两个关键问题的制约:1)RGB图像在白天与夜晚的差异大于热成像图像;2)夜间RGB图像各类别的性能并非始终高于或低于热成像图像。我们首次提出名为Night-TTA的测试时自适应(TTA)框架,以解决夜间RGBT语义分割中的上述问题,且自适应过程中无需访问源域(白天)数据。该方法包含三个关键技术部分。首先,当某一模态(如RGB)面临的域差异大于另一模态(如热成像)时,成像异质性修正(IHR)通过基于RGB分支和热成像分支引入交互分支,防止跨模态差异与性能退化。其次,引入类别感知修正(CAR),基于三个分支的像素级分布聚合获取可靠的集成逻辑值。此外,我们为Night TTA框架设计了特定的学习方案,使集成逻辑值与三个学生逻辑值在测试阶段协同学习,以提升预测质量。大量实验表明,该方法以平均交并比(mIoU)提升13.07%实现了最先进(SoTA)性能。