Conventional RGB cameras have been widely used in multi-object tracking due to their ability to capture rich appearance and semantic information. However, their performance is often degraded under complex real-world challenges, such as motion blur, low illumination, and overexposure. Bio-inspired event cameras offer high temporal resolution and high dynamic range, providing complementary cues under extreme scenarios. Nevertheless, RGB-event multi-object tracking remains underexplored due to the lack of large-scale and well-annotated datasets. To address this issue, we propose FEMOT, a large-scale RGB-event multi-object tracking dataset that covers diverse real-world scenarios and 14 challenging attributes. With both RGB and event data as well as high-quality annotations, FEMOT provides a reliable platform for systematically evaluating RGB-event multi-object tracking methods. Based on FEMOT, we retrain and evaluate over ten strong trackers, thereby establishing a comprehensive benchmark for future research. Furthermore, we propose FEMOTR, a multimodal tracking framework that decouples RGB and event features and fuses them in the frequency domain, thereby effectively exploiting their complementary characteristics for robust object localization and identity association. Extensive experiments on FEMOT and DSEC-MOT datasets demonstrate the effectiveness of the proposed method. The source code and benchmark dataset have been released on https://github.com/Event-AHU/FEMOT.
翻译:传统RGB相机因能捕获丰富的表观与语义信息而被广泛用于多目标跟踪,但在运动模糊、低光照和过曝等复杂现实场景下,其性能常显著下降。受生物启发的神经形态事件相机凭借高时间分辨率与高动态范围的优势,在极端场景中提供了互补性线索。然而,由于缺乏大规模且标注完善的基准数据集,基于RGB-事件的多目标跟踪技术仍处于探索阶段。为解决该问题,我们提出FEMOT——首个覆盖多种真实场景与14种挑战性属性的RGB-事件多目标跟踪大规模数据集。该数据集同时提供RGB数据、事件数据及高质量标注,为系统评估RGB-事件多目标跟踪方法提供了可靠平台。基于FEMOT,我们重新训练并评测十余种强基线跟踪器,构建了面向未来研究的综合性基准。此外,本文提出FEMOTR多模态跟踪框架:该框架通过解耦RGB特征与事件特征,并在频域实现特征融合,从而有效利用两者互补特性实现鲁棒的目标定位与身份关联。在FEMOT与DSEC-MOT数据集上的大量实验验证了所提方法的有效性。源代码与基准数据集已发布于https://github.com/Event-AHU/FEMOT。