Different from visible cameras which record intensity images frame by frame, the biologically inspired event camera produces a stream of asynchronous and sparse events with much lower latency. In practice, visible cameras can better perceive texture details and slow motion, while event cameras can be free from motion blurs and have a larger dynamic range which enables them to work well under fast motion and low illumination. Therefore, the two sensors can cooperate with each other to achieve more reliable object tracking. In this work, we propose a large-scale Visible-Event benchmark (termed VisEvent) due to the lack of a realistic and scaled dataset for this task. Our dataset consists of 820 video pairs captured under low illumination, high speed, and background clutter scenarios, and it is divided into a training and a testing subset, each of which contains 500 and 320 videos, respectively. Based on VisEvent, we transform the event flows into event images and construct more than 30 baseline methods by extending current single-modality trackers into dual-modality versions. More importantly, we further build a simple but effective tracking algorithm by proposing a cross-modality transformer, to achieve more effective feature fusion between visible and event data. Extensive experiments on the proposed VisEvent dataset, FE108, COESOT, and two simulated datasets (i.e., OTB-DVS and VOT-DVS), validated the effectiveness of our model. The dataset and source code have been released on: \url{https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark}.
翻译:有别于逐帧记录强度图像的可见光相机,受生物启发的神经形态事件相机能够以极低延迟生成异步稀疏的事件流。实际应用中,可见光相机擅长感知纹理细节与慢速运动,而事件相机可避免运动模糊且具有更广的动态范围,特别适合快速运动与低光照场景。因此,这两类传感器可协同工作以实现更鲁棒的目标跟踪。针对当前该研究领域缺乏真实场景大规模数据集的现状,本文构建了大规模可见光-事件联合基准数据集(VisEvent)。该数据集包含820对低光照、高速运动与背景杂乱场景下的视频对,按训练/测试划分为500组与320组视频。基于VisEvent,我们将事件流转化为事件图像,通过将现有单模态跟踪器扩展为双模态版本,构建了超过30种基线方法。更重要的是,我们提出一种简单有效的跨模态Transformer算法,实现可见光与事件数据间更高效的特征融合。在VisEvent、FE108、COESOT及两个仿真数据集(OTB-DVS与VOT-DVS)上的大量实验验证了本文模型的有效性。数据集与源代码已开源至:\url{https://github.com/wangxiao5791509/VisEvent_SOT_Benchmark}。