Accurate 3D object detection (3DOD) is crucial for safe navigation of complex environments by autonomous robots. Regressing accurate 3D bounding boxes in cluttered environments based on sparse LiDAR data is however a highly challenging problem. We address this task by exploring recent advances in conditional energy-based models (EBMs) for probabilistic regression. While methods employing EBMs for regression have demonstrated impressive performance on 2D object detection in images, these techniques are not directly applicable to 3D bounding boxes. In this work, we therefore design a differentiable pooling operator for 3D bounding boxes, serving as the core module of our EBM network. We further integrate this general approach into the state-of-the-art 3D object detector SA-SSD. On the KITTI dataset, our proposed approach consistently outperforms the SA-SSD baseline across all 3DOD metrics, demonstrating the potential of EBM-based regression for highly accurate 3DOD. Code is available at https://github.com/fregu856/ebms_3dod.
翻译:精确的三维目标检测(3DOD)对于自主机器人在复杂环境中的安全导航至关重要。然而,基于稀疏激光雷达数据在杂乱环境中回归出精确的三维边界框是一个极具挑战性的问题。我们通过探索条件能量模型(EBMs)在概率回归领域的最新进展来解决该任务。尽管基于EBMs的回归方法在图像二维目标检测中已展现出卓越性能,但这些技术无法直接应用于三维边界框。为此,我们设计了一种适用于三维边界框的可微池化算子,作为EBM网络的核心模块。进一步地,我们将这一通用方法集成到当前最先进的三维目标检测器SA-SSD中。在KITTI数据集上,我们提出的方法在全部3DOD指标上均持续优于SA-SSD基线,充分展示了基于EBM回归方法实现高精度三维目标检测的潜力。代码已开源:https://github.com/fregu856/ebms_3dod。