Multi-object tracking (MOT) aims to associate target objects across video frames in order to obtain entire moving trajectories. With the advancement of deep neural networks and the increasing demand for intelligent video analysis, MOT has gained significantly increased interest in the computer vision community. Embedding methods play an essential role in object location estimation and temporal identity association in MOT. Unlike other computer vision tasks, such as image classification, object detection, re-identification, and segmentation, embedding methods in MOT have large variations, and they have never been systematically analyzed and summarized. In this survey, we first conduct a comprehensive overview with in-depth analysis for embedding methods in MOT from seven different perspectives, including patch-level embedding, single-frame embedding, cross-frame joint embedding, correlation embedding, sequential embedding, tracklet embedding, and cross-track relational embedding. We further summarize the existing widely used MOT datasets and analyze the advantages of existing state-of-the-art methods according to their embedding strategies. Finally, some critical yet under-investigated areas and future research directions are discussed.
翻译:多目标跟踪(MOT)旨在跨视频帧关联目标对象,以获取完整的运动轨迹。随着深度神经网络的进步和智能视频分析需求的增长,MOT在计算机视觉领域受到越来越多的关注。嵌入方法在MOT的目标位置估计和时态身份关联中起着至关重要的作用。与图像分类、目标检测、重识别和分割等其他计算机视觉任务不同,MOT中的嵌入方法变化多样,且从未被系统地分析和总结。在本综述中,我们首先从七个不同视角对MOT中的嵌入方法进行了全面概述和深入分析,包括补丁级嵌入、单帧嵌入、跨帧联合嵌入、相关性嵌入、序列嵌入、轨迹段嵌入和跨轨迹关系嵌入。我们进一步总结了现有广泛使用的MOT数据集,并基于不同嵌入策略分析了现有最先进方法的优势。最后,讨论了一些关键但尚未充分研究的领域及未来研究方向。