$\mathbf{C}^2$Former: Calibrated and Complementary Transformer for RGB-Infrared Object Detection

Object detection on visible (RGB) and infrared (IR) images, as an emerging solution to facilitate robust detection for around-the-clock applications, has received extensive attention in recent years. With the help of IR images, object detectors have been more reliable and robust in practical applications by using RGB-IR combined information. However, existing methods still suffer from modality miscalibration and fusion imprecision problems. Since transformer has the powerful capability to model the pairwise correlations between different features, in this paper, we propose a novel Calibrated and Complementary Transformer called $\mathrm{C}^2$Former to address these two problems simultaneously. In $\mathrm{C}^2$Former, we design an Inter-modality Cross-Attention (ICA) module to obtain the calibrated and complementary features by learning the cross-attention relationship between the RGB and IR modality. To reduce the computational cost caused by computing the global attention in ICA, an Adaptive Feature Sampling (AFS) module is introduced to decrease the dimension of feature maps. Because $\mathrm{C}^2$Former performs in the feature domain, it can be embedded into existed RGB-IR object detectors via the backbone network. Thus, one single-stage and one two-stage object detector both incorporating our $\mathrm{C}^2$Former are constructed to evaluate its effectiveness and versatility. With extensive experiments on the DroneVehicle and KAIST RGB-IR datasets, we verify that our method can fully utilize the RGB-IR complementary information and achieve robust detection results. The code is available at https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git.

翻译：可见光（RGB）与红外（IR）图像的目标检测，作为实现全天候应用鲁棒检测的新兴技术方案，近年来受到广泛关注。借助红外图像，基于RGB-IR融合信息的目标检测器在实际应用中展现出更高的可靠性与鲁棒性。然而现有方法仍面临模态校准偏差与融合精度不足的挑战。鉴于Transformer具备建模异构特征间成对关联的强大能力，本文提出一种名为$\mathrm{C}^2$Former的新型校准互补Transformer，旨在同时解决上述两个问题。在$\mathrm{C}^2$Former中，我们设计了模态间交叉注意力（ICA）模块，通过学习RGB与IR模态间的交叉注意力关系来获取校准互补特征。为降低ICA全局注意力计算带来的运算开销，引入自适应特征采样（AFS）模块以缩减特征图维度。由于$\mathrm{C}^2$Former作用于特征域，可经由骨干网络嵌入现有RGB-IR目标检测器中。据此，我们分别构建了集成$\mathrm{C}^2$Former的单阶段与双阶段目标检测器，以验证其有效性与通用性。通过在DroneVehicle和KAIST RGB-IR数据集上的大量实验证明，本方法能充分利用RGB-IR互补信息实现鲁棒检测。代码开源地址：https://github.com/yuanmaoxun/Calibrated-and-Complementary-Transformer-for-RGB-Infrared-Object-Detection.git。

相关内容