The domain of Multi-Object Tracking (MOT) is of paramount significance within the realm of video analysis. However, both traditional methodologies and deep learning-based approaches within this domain exhibit inherent limitations. Deep learning methods driven exclusively by data exhibit challenges in accurately discerning the motion states of objects, while traditional methods relying on comprehensive mathematical models may suffer from suboptimal tracking precision. To address these challenges, we introduce the Model-Data-Driven Motion-Static Object Tracking Method (MoD2T). We propose a novel architecture that adeptly amalgamates traditional mathematical modeling with deep learning-based MOT frameworks, thereby effectively mitigating the limitations associated with sole reliance on established methodologies or advanced deep learning techniques. MoD2T's fusion of mathematical modeling and deep learning augments the precision of object motion determination, consequently enhancing tracking accuracy. Our empirical experiments robustly substantiate MoD2T's efficacy across a diverse array of scenarios, including UAV aerial surveillance and street-level tracking. To assess MoD2T's proficiency in discerning object motion states, we introduce MVF1 metric. This novel performance metric is designed to measure the accuracy of motion state classification, providing a comprehensive evaluation of MoD2T's performance. Meticulous experiments substantiate the rationale behind MVF1's formulation. To provide a comprehensive assessment of MoD2T's performance, we meticulously annotate diverse datasets and subject MoD2T to rigorous testing. The achieved MVF1 scores, which measure the accuracy of motion state classification, are particularly noteworthy in scenarios marked by minimal or mild camera motion, with values of 0.774 on the KITTI dataset, 0.521 on MOT17, and 0.827 on UAVDT.
翻译:多目标跟踪领域在视频分析中具有极其重要的意义。然而,该领域的传统方法和基于深度学习的方法均存在固有局限性。纯数据驱动的深度学习方法难以准确辨别目标的运动状态,而依赖完备数学模型的传统方法可能面临跟踪精度欠佳的问题。为应对这些挑战,我们提出了模型-数据驱动的动静态目标跟踪方法(MoD2T)。我们设计了一种新颖的架构,巧妙融合传统数学建模与基于深度学习的多目标跟踪框架,从而有效缓解单靠传统方法或先进深度学习技术所带来的局限。MoD2T将数学建模与深度学习相结合,提升了目标运动状态判定的精度,进而增强了跟踪准确性。我们的实证实验有力证明了MoD2T在多种场景下的有效性,包括无人机航拍监控和街道级跟踪。为评估MoD2T辨别目标运动状态的能力,我们引入了MVF1指标。这一新型性能指标旨在衡量运动状态分类的准确性,从而对MoD2T的性能进行全面评估。精细的实验验证了MVF1指标设计依据的合理性。为全面评估MoD2T的性能,我们精心标注了多个数据集,并对MoD2T进行了严格测试。在相机运动微弱或轻微的典型场景中,MVF1分数尤为突出:在KITTI数据集上达到0.774,在MOT17上为0.521,在UAVDT上为0.827——这些分数均反映了运动状态分类的准确性。