Data association is a knotty problem for 2D Multiple Object Tracking due to the object occlusion. However, in 3D space, data association is not so hard. Only with a 3D Kalman Filter, the online object tracker can associate the detections from LiDAR. In this paper, we rethink the data association in 2D MOT and utilize the 3D object representation to separate each object in the feature space. Unlike the existing depth-based MOT methods, the 3D object representation can be jointly learned with the object association module. Besides, the object's 3D representation is learned from the video and supervised by the 2D tracking labels without additional manual annotations from LiDAR or pretrained depth estimator. With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack. Extensive experiments show the effectiveness of our method. We achieve new state-of-the-art performance on the large-scale Waymo Open Dataset.
翻译:数据关联是二维多目标跟踪中因目标遮挡而面临的棘手问题。然而,在三维空间中,数据关联的难度显著降低。仅需结合三维卡尔曼滤波器,在线目标跟踪器即可有效关联激光雷达的检测结果。本文重新审视了二维多目标跟踪中的数据关联问题,并利用三维目标表示在特征空间中分离各目标。与现有基于深度的多目标跟踪方法不同,三维目标表示可与目标关联模块联合学习。此外,目标的三维表示直接从视频中学习,仅需二维跟踪标签作为监督信号,无需来自激光雷达或预训练深度估计器的额外人工标注。基于单目视频中伪三维目标标签的三维表示学习,我们提出了一种新的二维多目标跟踪范式——P3DTrack。大量实验验证了该方法的有效性,并在大规模Waymo开放数据集上取得了最新的最优性能。