Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion

Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost information in images through cross-modality fusion, enabling the network to obtain illumination-robust representations for traffic object detection. Specifically, to mitigate the sparsity or blurriness issues arising from diverse motion states of traffic objects in fixed-interval event sampling methods, we propose the Reliable Structure Generation Network (RSGNet) to generate Speed Invariant Frames (SIF), ensuring the integrity and sharpness of object structures. Next, we design a novel Adaptive Feature Complement Module (AFCM) which guides the adaptive fusion of two modality features to compensate for the information loss in the images by perceiving the global lightness distribution of the images, thereby generating illumination-robust representations. Finally, considering the lack of large-scale and high-quality annotations in the existing event-based object detection datasets, we build a DSEC-Det dataset, which consists of 53 sequences with 63,931 images and more than 208,000 labels for 8 classes. Extensive experimental results demonstrate that our proposed SFNet can overcome the perceptual boundaries of conventional cameras and outperform the frame-based method by 8.0% in mAP50 and 5.9% in mAP50:95. Our code and dataset will be available at https://github.com/YN-Yang/SFNet.

翻译：变光照条件下的交通目标检测具有挑战性，原因在于传统帧相机有限的动态范围导致信息丢失。为解决该问题，我们引入仿生事件相机，并提出一种新颖的结构感知融合网络（SFNet），该网络通过跨模态融合从事件流中提取清晰完整的物体结构，以补偿图像中丢失的信息，从而使网络获得对光照鲁棒的目标检测表示。具体而言，为缓解固定间隔事件采样方法中因交通目标不同运动状态导致的稀疏或模糊问题，我们提出可靠结构生成网络（RSGNet）以生成速度不变帧（SIF），确保物体结构的完整性与清晰度。接着，我们设计了一种新颖的自适应特征互补模块（AFCM），该模块通过感知图像全局亮度分布来引导两种模态特征的自适应融合，以补偿图像中的信息损失，从而生成光照鲁棒的表示。最后，针对现有基于事件的目标检测数据集中缺乏大规模高质量标注的问题，我们构建了DSEC-Det数据集，包含53个序列共63,931张图像以及8个类别的超过208,000个标签。大量实验结果表明，我们提出的SFNet能够突破传统相机的感知极限，在mAP50和mAP50:95指标上分别比基于帧的方法提升8.0%和5.9%。我们的代码和数据集将在https://github.com/YN-Yang/SFNet公开。