Multispectral pedestrian detection has gained significant attention in recent years, particularly in autonomous driving applications. To address the challenges posed by adversarial illumination conditions, the combination of thermal and visible images has demonstrated its advantages. However, existing fusion methods rely on the critical assumption that the RGB-Thermal (RGB-T) image pairs are fully overlapping. These assumptions often do not hold in real-world applications, where only partial overlap between images can occur due to sensors configuration. Moreover, sensor failure can cause loss of information in one modality. In this paper, we propose a novel module called the Hybrid Attention (HA) mechanism as our main contribution to mitigate performance degradation caused by partial overlap and sensor failure, i.e. when at least part of the scene is acquired by only one sensor. We propose an improved RGB-T fusion algorithm, robust against partial overlap and sensor failure encountered during inference in real-world applications. We also leverage a mobile-friendly backbone to cope with resource constraints in embedded systems. We conducted experiments by simulating various partial overlap and sensor failure scenarios to evaluate the performance of our proposed method. The results demonstrate that our approach outperforms state-of-the-art methods, showcasing its superiority in handling real-world challenges.
翻译:近年来,多光谱行人检测,特别是在自动驾驶应用中,获得了显著关注。为应对对抗性光照条件带来的挑战,热成像与可见光图像的结合已展现出其优势。然而,现有的融合方法依赖于一个关键假设,即RGB-热成像(RGB-T)图像对是完全重叠的。这一假设在现实应用中往往不成立,由于传感器配置的原因,图像间可能仅存在部分重叠。此外,传感器故障可能导致某一模态的信息丢失。在本文中,我们提出了一种称为混合注意力(HA)机制的新模块,作为我们的主要贡献,旨在缓解由部分重叠和传感器故障(即场景中至少有一部分仅由一个传感器获取)引起的性能下降。我们提出了一种改进的RGB-T融合算法,该算法对现实应用推理过程中遇到的部分重叠和传感器故障具有鲁棒性。我们还利用了一个移动友好的骨干网络来应对嵌入式系统中的资源限制。我们通过模拟各种部分重叠和传感器故障场景进行了实验,以评估所提方法的性能。结果表明,我们的方法优于现有最先进的方法,展现了其在处理现实世界挑战方面的优越性。