This paper proposes a fast and online method for jointly performing 3D multi-object tracking and pose estimation using multiple monocular cameras. Our algorithm requires only 2D bounding box and pose detections, eliminating the need for costly 3D training data or computationally expensive deep learning models. Our solution is an efficient implementation of a Bayes-optimal multi-object tracking filter, enhancing computational efficiency while maintaining accuracy. We demonstrate that our algorithm is significantly faster than state-of-the-art methods without compromising accuracy, using only publicly available pre-trained 2D detection models. We also illustrate the robust performance of our algorithm in scenarios where multiple cameras are intermittently disconnected or reconnected during operation.
翻译:本文提出一种快速在线方法,用于联合执行基于多台单目相机的三维多目标跟踪与姿态估计。该算法仅需二维边界框和姿态检测结果,无需昂贵的三维训练数据或高计算开销的深度学习模型。通过实现贝叶斯最优多目标跟踪滤波器的高效版本,我们在保持精度的同时显著提升了计算效率。实验表明,该算法仅使用公开预训练的二维检测模型,即可在不牺牲精度的情况下比现有最优方法快得多。此外,我们展示了算法在相机运行中频繁断开或重连的间歇性多视角场景下的鲁棒性能。