DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection

Object detection in poor-illumination environments is a challenging task as objects are usually not clearly visible in RGB images. As infrared images provide additional clear edge information that complements RGB images, fusing RGB and infrared images has potential to enhance the detection ability in poor-illumination environments. However, existing works involving both visible and infrared images only focus on image fusion, instead of object detection. Moreover, they directly fuse the two kinds of image modalities, which ignores the mutual interference between them. To fuse the two modalities to maximize the advantages of cross-modality, we design a dual-enhancement-based cross-modality object detection network DEYOLO, in which semantic-spatial cross modality and novel bi-directional decoupled focus modules are designed to achieve the detection-centered mutual enhancement of RGB-infrared (RGB-IR). Specifically, a dual semantic enhancing channel weight assignment module (DECA) and a dual spatial enhancing pixel weight assignment module (DEPA) are firstly proposed to aggregate cross-modality information in the feature space to improve the feature representation ability, such that feature fusion can aim at the object detection task. Meanwhile, a dual-enhancement mechanism, including enhancements for two-modality fusion and single modality, is designed in both DECAand DEPAto reduce interference between the two kinds of image modalities. Then, a novel bi-directional decoupled focus is developed to enlarge the receptive field of the backbone network in different directions, which improves the representation quality of DEYOLO. Extensive experiments on M3FD and LLVIP show that our approach outperforms SOTA object detection algorithms by a clear margin. Our code is available at https://github.com/chips96/DEYOLO.

翻译：在弱光照环境下进行目标检测是一项具有挑战性的任务，因为物体在RGB图像中通常不清晰可见。由于红外图像提供了补充RGB图像的额外清晰边缘信息，融合RGB与红外图像有潜力增强弱光照环境下的检测能力。然而，现有涉及可见光与红外图像的研究仅聚焦于图像融合，而非目标检测。此外，它们直接融合两种图像模态，忽略了其间的相互干扰。为融合两种模态以最大化跨模态优势，我们设计了一种基于双增强的跨模态目标检测网络DEYOLO，其中设计了语义-空间跨模态模块与新颖的双向解耦焦点模块，以实现以检测为中心的RGB-红外（RGB-IR）相互增强。具体而言，首先提出了双语义增强通道权重分配模块（DECA）与双空间增强像素权重分配模块（DEPA），在特征空间中聚合跨模态信息以提升特征表示能力，从而使特征融合能够针对目标检测任务。同时，在DECA与DEPA中设计了包含双模态融合增强与单模态增强的双增强机制，以减少两种图像模态间的干扰。随后，开发了一种新颖的双向解耦焦点模块，以在不同方向上扩大骨干网络的感受野，从而提升DEYOLO的表征质量。在M3FD与LLVIP数据集上的大量实验表明，我们的方法以明显优势超越了当前最优（SOTA）目标检测算法。我们的代码公开于 https://github.com/chips96/DEYOLO。