Although end-to-end multi-object trackers like MOTR enjoy the merits of simplicity, they suffer from the conflict between detection and association seriously, resulting in unsatisfactory convergence dynamics. While MOTRv2 partly addresses this problem, it demands an additional detection network for assistance. In this work, we serve as the first to reveal that this conflict arises from the unfair label assignment between detect queries and track queries during training, where these detect queries recognize targets and track queries associate them. Based on this observation, we propose MOTRv3, which balances the label assignment process using the developed release-fetch supervision strategy. In this strategy, labels are first released for detection and gradually fetched back for association. Besides, another two strategies named pseudo label distillation and track group denoising are designed to further improve the supervision for detection and association. Without the assistance of an extra detection network during inference, MOTRv3 achieves impressive performance across diverse benchmarks, e.g., MOT17, DanceTrack.
翻译:尽管像MOTR这样的端到端多目标跟踪器具有简洁性的优点,但它们严重受到检测与关联之间冲突的影响,导致收敛动态不理想。虽然MOTRv2部分解决了这一问题,但它需要额外的检测网络辅助。本文首次揭示这一冲突源于训练过程中检测查询与跟踪查询之间的标签分配不公,其中检测查询负责识别目标,而跟踪查询负责关联它们。基于这一发现,我们提出MOTRv3,利用所开发的释放-获取监督策略平衡标签分配过程。在该策略中,标签首先被释放用于检测,然后逐步被回收用于关联。此外,我们还设计了伪标签蒸馏与跟踪组去噪两种策略,以进一步改善检测与关联的监督。在推理过程中无需额外检测网络辅助的情况下,MOTRv3在MOT17、DanceTrack等多个基准测试中均取得了令人瞩目的性能。