In this work, we investigate four different fusion methods for associating detections to tracklets in multi-object visual tracking. In addition to considering strong cues such as motion and appearance information, we also consider weak cues such as height intersection-over-union (height-IoU) and tracklet confidence information in the data association using different fusion methods. These fusion methods include minimum, weighted sum based on IoU, Kalman filter (KF) gating, and hadamard product of costs due to the different cues. We conduct extensive evaluations on validation sets of MOT17, MOT20 and DanceTrack datasets, and find out that the choice of a fusion method is key for data association in multi-object visual tracking. We hope that this investigative work helps the computer vision research community to use the right fusion method for data association in multi-object visual tracking.
翻译:在本研究中,我们探讨了四种不同的融合方法,用于在多目标视觉跟踪中将检测结果与轨迹段进行关联。除了考虑运动和外观信息等强线索外,我们还通过不同的融合方法,在数据关联中引入了高度交并比(height-IoU)和轨迹段置信度等弱线索。这些融合方法包括基于不同线索的最小值法、基于IoU的加权求和法、卡尔曼滤波器(KF)门控法以及代价的哈达玛积法。我们在MOT17、MOT20和DanceTrack数据集的验证集上进行了广泛评估,发现融合方法的选择是多目标视觉跟踪中数据关联的关键。我们希望这项探索性工作能帮助计算机视觉研究社区在多目标视觉跟踪中为数据关联选择恰当的融合方法。