Cross-view multi-object tracking aims to link objects between frames and camera views with substantial overlaps. Although cross-view multi-object tracking has received increased attention in recent years, existing datasets still have several issues, including 1) missing real-world scenarios, 2) lacking diverse scenes, 3) owning a limited number of tracks, 4) comprising only static cameras, and 5) lacking standard benchmarks, which hinder the investigation and comparison of cross-view tracking methods. To solve the aforementioned issues, we introduce DIVOTrack: a new cross-view multi-object tracking dataset for DIVerse Open scenes with dense tracking pedestrians in realistic and non-experimental environments. Our DIVOTrack has ten distinct scenarios and 550 cross-view tracks, surpassing all cross-view multi-object tracking datasets currently available. Furthermore, we provide a novel baseline cross-view tracking method with a unified joint detection and cross-view tracking framework named CrossMOT, which learns object detection, single-view association, and cross-view matching with an all-in-one embedding model. Finally, we present a summary of current methodologies and a set of standard benchmarks with our DIVOTrack to provide a fair comparison and conduct a comprehensive analysis of current approaches and our proposed CrossMOT. The dataset and code are available at https://github.com/shengyuhao/DIVOTrack.
翻译:跨视角多目标跟踪旨在关联存在显著重叠的帧间与摄像机视角间的目标。尽管跨视角多目标跟踪近年来受到日益关注,现有数据集仍存在若干问题,包括:1) 缺乏真实世界场景,2) 缺少多样化场景,3) 跟踪轨迹数量有限,4) 仅包含静态摄像机,5) 缺乏标准化基准。这些问题阻碍了跨视角跟踪方法的研究与比较。为解决上述问题,我们提出DIVOTrack:一个面向多样化开放场景的新跨视角多目标跟踪数据集,其包含现实且非实验环境中密集跟踪的行人。我们的DIVOTrack涵盖十个不同场景及550条跨视角轨迹,超越了现有所有跨视角多目标跟踪数据集。此外,我们提出了一种新颖的基线跨视角跟踪方法——CrossMOT,该方法采用统一的联合检测与跨视角跟踪框架,通过全一体嵌入模型学习目标检测、单视角关联与跨视角匹配。最后,我们总结了现有方法,并基于DIVOTrack提供了一套标准化基准,以公平比较并对现有方法及我们所提CrossMOT进行全面分析。数据集与代码见https://github.com/shengyuhao/DIVOTrack。