The recent trend in multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes, or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 79.7), MOT17 (MOTA of 81.3%) and MOT20 (MOTA of 78.9%).
翻译:近年来,多目标跟踪(MOT)的研究趋势是联合解决检测与跟踪问题,即同步学习目标检测与外观特征(或运动信息)。尽管该方法性能优越,但在拥挤场景中,由于漏检或误检,联合检测与跟踪通常难以实现准确的目标关联。本文提出一种端到端框架CountingMOT,专门针对拥挤场景联合建模计数、检测与重识别。通过引入检测与计数之间的目标数量互约束,CountingMOT在目标检测与人群密度图估计之间寻求平衡,从而有助于恢复漏检或剔除误检。我们的方法旨在弥合目标检测、计数与重识别之间的鸿沟,这与先前忽略人群密度而导致在拥挤场景中易失效的MOT方法,或依赖局部相关性构建图关系以匹配目标的方法形成鲜明对比。所提出的MOT跟踪器支持在线实时跟踪,并在MOT16(MOTA为79.7)、MOT17(MOTA为81.3%)和MOT20(MOTA为78.9%)公开基准测试中取得了最先进的结果。