A Hybrid Approach To Real-Time Multi-Object Tracking

Multi-Object Tracking, also known as Multi-Target Tracking, is a significant area of computer vision that has many uses in a variety of settings. The development of deep learning, which has encouraged researchers to propose more and more work in this direction, has significantly impacted the scientific advancement around the study of tracking as well as many other domains related to computer vision. In fact, all of the solutions that are currently state-of-the-art in the literature and in the tracking industry, are built on top of deep learning methodologies that produce exceptionally good results. Deep learning is enabled thanks to the ever more powerful technology researchers can use to handle the significant computational resources demanded by these models. However, when real-time is a main requirement, developing a tracking system without being constrained by expensive hardware support with enormous computational resources is necessary to widen tracking applications in real-world contexts. To this end, a compromise is to combine powerful deep strategies with more traditional approaches to favor considerably lower processing solutions at the cost of less accurate tracking results even though suitable for real-time domains. Indeed, the present work goes in that direction, proposing a hybrid strategy for real-time multi-target tracking that combines effectively a classical optical flow algorithm with a deep learning architecture, targeted to a human-crowd tracking system exhibiting a desirable trade-off between performance in tracking precision and computational costs. The developed architecture was experimented with different settings, and yielded a MOTA of 0.608 out of the compared state-of-the-art 0.549 results, and about half the running time when introducing the optical flow phase, achieving almost the same performance in terms of accuracy.

翻译：多目标跟踪（Multi-Object Tracking，亦称多目标跟踪）是计算机视觉中的重要领域，在多种场景中具有广泛应用。深度学习的发展不仅显著推动了跟踪研究，也促进了计算机视觉其他相关领域的科学进步——这一技术革新促使研究者不断提出更多相关成果。事实上，当前文献与跟踪产业中的所有先进解决方案均建立在能产生优异结果的深度学习方法之上。深度学习之所以成为可能，得益于研究者能够使用日益强大的技术来处理这些模型所消耗的巨额计算资源。然而，当实时性成为核心需求时，开发不受昂贵硬件和巨大计算资源限制的跟踪系统，对于拓宽跟踪技术在现实场景中的应用至关重要。为此，一种折衷方案是将强大的深度学习策略与传统方法相结合，以显著降低处理成本（即使牺牲部分跟踪精度）为代价，实现适用于实时场景的解决方案。本研究正是沿着这一方向，提出了一种面向实时多目标跟踪的混合策略，将经典光流算法与深度学习架构有效结合，针对人群跟踪系统实现了跟踪精度与计算成本之间的理想平衡。所提出的架构在不同配置下进行了实验，在跟踪精度（MOTA）上达到0.608（相比当前最优方法的0.549），而引入光流阶段后运行时间缩短约一半，同时在准确率方面几乎保持同等性能。