The recent trend in 2D multiple object tracking (MOT) is jointly solving detection and tracking, where object detection and appearance feature (or motion) are learned simultaneously. Despite competitive performance, in crowded scenes, joint detection and tracking usually fail to find accurate object associations due to missed or false detections. In this paper, we jointly model counting, detection and re-identification in an end-to-end framework, named CountingMOT, tailored for crowded scenes. By imposing mutual object-count constraints between detection and counting, the CountingMOT tries to find a balance between object detection and crowd density map estimation, which can help it to recover missed detections or reject false detections. Our approach is an attempt to bridge the gap of object detection, counting, and re-Identification. This is in contrast to prior MOT methods that either ignore the crowd density and thus are prone to failure in crowded scenes,or depend on local correlations to build a graphical relationship for matching targets. The proposed MOT tracker can perform online and real-time tracking, and achieves the state-of-the-art results on public benchmarks MOT16 (MOTA of 79.7), MOT17 (MOTA of 81.3%) and MOT20 (MOTA of 78.9%).
翻译:近年来,二维多目标跟踪(MOT)的研究趋势是将检测与跟踪联合求解,同时学习目标检测与外观特征(或运动信息)。尽管性能表现优异,但在拥挤场景中,联合检测与跟踪方法常因漏检或误检而难以实现准确的目标关联。本文提出一种名为CountingMOT的端到端框架,通过联合建模计数、检测与重识别,专为拥挤场景设计。通过引入检测与计数间的目标数量互约束,CountingMOT试图在目标检测与人群密度图估计之间寻求平衡,从而恢复漏检或剔除误检。本方法旨在弥合目标检测、计数与重识别之间的差距,这与以往忽略人群密度(导致拥挤场景易失效)或依赖局部相关性构建目标匹配图关系的MOT方法形成鲜明对比。所提出的MOT跟踪器支持实时在线跟踪,并在公共基准测试MOT16(MOTA 79.7%)、MOT17(MOTA 81.3%)及MOT20(MOTA 78.9%)上取得了最先进的性能。