Multi-view aggregation promises to overcome the occlusion and missed detection challenge in multi-object detection and tracking. Recent approaches in multi-view detection and 3D object detection made a huge performance leap by projecting all views to the ground plane and performing the detection in the Bird's Eye View (BEV). In this paper, we investigate if tracking in the BEV can also bring the next performance breakthrough in Multi-Target Multi-Camera (MTMC) tracking. Most current approaches in multi-view tracking perform the detection and tracking task in each view and use graph-based approaches to perform the association of the pedestrian across each view. This spatial association is already solved by detecting each pedestrian once in the BEV, leaving only the problem of temporal association. For the temporal association, we show how to learn strong Re-Identification (re-ID) features for each detection. The results show that early-fusion in the BEV achieves high accuracy for both detection and tracking. EarlyBird outperforms the state-of-the-art methods and improves the current state-of-the-art on Wildtrack by +4.6 MOTA and +5.6 IDF1.
翻译:多视角聚合有望克服多目标检测与跟踪中的遮挡和漏检挑战。近期,多视角检测与3D物体检测领域的方法通过将所有视角投影至地平面并在鸟瞰视角(BEV)下进行检测,实现了性能的巨大飞跃。本文探讨了在BEV中实施跟踪能否为多目标多摄像机(MTMC)跟踪带来下一次性能突破。当前多数多视角跟踪方法在各视角下分别执行检测与跟踪任务,并采用基于图的方法完成跨视角行人关联。而通过BEV中一次性检测每个行人的方式,空间关联问题已被解决,仅剩时间关联问题待处理。针对时间关联,我们展示了如何为每个检测目标学习强判别性的重识别(Re-ID)特征。结果表明,BEV中的早期融合方法在检测与跟踪任务中均实现了高精度。EarlyBird方法超越了现有最优技术,在Wildtrack数据集上提升MOTA指标+4.6、IDF1指标+5.6,刷新了当前最优水平。