In this work, we study the problem of object re-identification (ReID) in a 3D multi-object tracking (MOT) context, by learning to match pairs of objects from cropped (e.g., using their predicted 3D bounding boxes) point cloud observations. We are not concerned with SOTA performance for 3D MOT, however. Instead, we seek to answer the following question: In a realistic tracking by-detection context, how does object ReID from point clouds perform relative to ReID from images? To enable such a study, we propose a lightweight matching head that can be concatenated to any set or sequence processing backbone (e.g., PointNet or ViT), creating a family of comparable object ReID networks for both modalities. Run in siamese style, our proposed point-cloud ReID networks can make thousands of pairwise comparisons in real-time (10 hz). Our findings demonstrate that their performance increases with higher sensor resolution and approaches that of image ReID when observations are sufficiently dense. Additionally, we investigate our network's ability to enhance 3D multi-object tracking (MOT), showing that our point-cloud ReID networks can successfully re-identify objects which led a strong motion-based tracker into error. To our knowledge, we are the first to study real-time object re-identification from point clouds in a 3D multi-object tracking context.
翻译:在本研究中,我们探讨了三维多目标跟踪背景下基于点云的目标重识别问题,通过学习匹配裁剪后的点云观测(例如利用预测的三维边界框)中的目标对。然而,我们并非追求三维多目标跟踪的最新性能表现,而是旨在解答以下问题:在基于检测的实时跟踪场景中,基于点云的目标重识别相较于基于图像的重识别性能如何?为此,我们提出一种轻量级匹配头,可连接至任意序列或点集处理主干网络(如PointNet或ViT),从而构建两类模态下可比较的目标重识别网络族。采用孪生网络架构运行时,我们提出的点云重识别网络可在实时条件下(10赫兹)完成数千对样本的比对。实验结果表明,其性能随传感器分辨率提升而增强,并在观测数据足够密集时接近图像重识别效果。此外,我们探究了该网络对三维多目标跟踪的增强能力,证明其能成功重识别导致强运动基跟踪器产生误差的目标。据我们所知,本研究首次在三维多目标跟踪背景下实现基于点云的实时目标重识别。