Infrared-visible object detection aims to achieve robust even full-day object detection by fusing the complementary information of infrared and visible images. However, highly dynamically variable complementary characteristics and commonly existing modality misalignment make the fusion of complementary information difficult. In this paper, we propose a Dynamic Adaptive Multispectral Detection Transformer (DAMSDet) to simultaneously address these two challenges. Specifically, we propose a Modality Competitive Query Selection strategy to provide useful prior information. This strategy can dynamically select basic salient modality feature representation for each object. To effectively mine the complementary information and adapt to misalignment situations, we propose a Multispectral Deformable Cross-attention module to adaptively sample and aggregate multi-semantic level features of infrared and visible images for each object. In addition, we further adopt the cascade structure of DETR to better mine complementary information. Experiments on four public datasets of different scenes demonstrate significant improvements compared to other state-of-the-art methods. The code will be released at https://github.com/gjj45/DAMSDet.
翻译:红外-可见光目标检测旨在通过融合红外与可见光图像的互补信息,实现鲁棒乃至全天候的目标检测。然而,高度动态变化的互补特性及普遍存在的模态未对齐现象,使得互补信息的融合面临困难。本文提出一种动态自适应多光谱检测Transformer(DAMSDet),可同时应对这两项挑战。具体而言,我们设计了一种模态竞争查询选择策略,为每个目标动态筛选基础显著模态特征表征,以提供有效的先验信息。为充分挖掘互补信息并适应未对齐场景,我们提出多光谱可变形交叉注意力模块,针对每个目标自适应采样并聚合红外与可见光图像的多语义层次特征。此外,我们进一步采用DETR的级联结构以更好地挖掘互补信息。在四个不同场景的公开数据集上的实验表明,本方法相较其他最先进方法具有显著性能提升。相关代码将发布于 https://github.com/gjj45/DAMSDet。