In this work, we study the problem of object re-identification (ReID) in a 3D multi-object tracking (MOT) context, by learning to match pairs of objects from cropped (e.g., using their predicted 3D bounding boxes) point cloud observations. We are not concerned with SOTA performance for 3D MOT, however. Instead, we seek to answer the following question: In a realistic tracking by-detection context, how does object ReID from point clouds perform relative to ReID from images? To enable such a study, we propose a lightweight matching head that can be concatenated to any set or sequence processing backbone (e.g., PointNet or ViT), creating a family of comparable object ReID networks for both modalities. Run in siamese style, our proposed point-cloud ReID networks can make thousands of pairwise comparisons in real-time (10 hz). Our findings demonstrate that their performance increases with higher sensor resolution and approaches that of image ReID when observations are sufficiently dense. Additionally, we investigate our network's ability to enhance 3D multi-object tracking (MOT), showing that our point-cloud ReID networks can successfully re-identify objects which led a strong motion-based tracker into error. To our knowledge, we are the first to study real-time object re-identification from point clouds in a 3D multi-object tracking context.
翻译:本研究探讨了三维多目标跟踪场景下的目标重识别问题,通过学习匹配从裁剪点云观测(例如利用预测的三维边界框)中提取的成对目标。然而,我们并不追求三维多目标跟踪的最新性能水平,而是旨在回答以下问题:在基于检测的真实跟踪场景中,基于点云的目标重识别性能相较于基于图像的重识别表现如何?为此,我们提出了一种轻量级匹配头模块,可接入任意基于集合或序列处理的骨干网络(如PointNet或ViT),从而构建一组适用于两种模态的可比目标重识别网络。采用孪生网络架构运行时,我们提出的点云重识别网络能够以实时频率(10赫兹)执行数千次成对匹配。实验结果表明,该网络性能随传感器分辨率的提升而增强,且当观测数据足够密集时,其表现接近图像重识别。此外,我们验证了网络对三维多目标跟踪的增强能力——我们的点云重识别网络能够成功重识别导致强运动跟踪器产生误差的目标。据我们所知,这是首次在三维多目标跟踪场景中研究基于点云的实时目标重识别。