Combining the Color and Event cameras (also called Dynamic Vision Sensors, DVS) for robust object tracking is a newly emerging research topic in recent years. Existing color-event tracking framework usually contains multiple scattered modules which may lead to low efficiency and high computational complexity, including feature extraction, fusion, matching, interactive learning, etc. In this paper, we propose a single-stage backbone network for Color-Event Unified Tracking (CEUTrack), which achieves the above functions simultaneously. Given the event points and RGB frames, we first transform the points into voxels and crop the template and search regions for both modalities, respectively. Then, these regions are projected into tokens and parallelly fed into the unified Transformer backbone network. The output features will be fed into a tracking head for target object localization. Our proposed CEUTrack is simple, effective, and efficient, which achieves over 75 FPS and new SOTA performance. To better validate the effectiveness of our model and address the data deficiency of this task, we also propose a generic and large-scale benchmark dataset for color-event tracking, termed COESOT, which contains 90 categories and 1354 video sequences. Additionally, a new evaluation metric named BOC is proposed in our evaluation toolkit to evaluate the prominence with respect to the baseline methods. We hope the newly proposed method, dataset, and evaluation metric provide a better platform for color-event-based tracking. The dataset, toolkit, and source code will be released on: \url{https://github.com/Event-AHU/COESOT}.
翻译:将彩色相机与事件相机(也称为动态视觉传感器,DVS)结合以实现鲁棒目标跟踪,是近年来新兴的研究课题。现有彩色-事件跟踪框架通常包含多个分散模块(如特征提取、融合、匹配、交互学习等),可能导致效率低下和计算复杂度高。本文提出一种单阶段骨干网络,用于彩色-事件统一跟踪(CEUTrack),可同时实现上述功能。针对事件点与RGB帧,我们首先将事件点转换为体素,并分别裁剪两种模态的模板区域与搜索区域。随后,将这些区域投影为令牌,并行输入统一的Transformer骨干网络。输出特征将被送入跟踪头以定位目标对象。我们提出的CEUTrack简单、有效且高效,可实现超过75 FPS的速率,并达到新的最优性能。为更好地验证模型有效性并解决该任务数据不足的问题,我们还提出一个通用大规模彩色-事件跟踪基准数据集,命名为COESOT,包含90个类别和1354个视频序列。此外,我们的评估工具包中新增了名为BOC的评估指标,用于衡量方法相对于基线模型的显著优势。我们希望所提出的新方法、数据集和评估指标能为基于彩色-事件的目标跟踪提供更好的平台。数据集、工具包和源代码将发布于:\url{https://github.com/Event-AHU/COESOT}。