In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. In particular, different from other highly mature tasks, e.g., 2D object detection, the community of image-based 3D object detection is still evolving, where methods often adopt different training recipes and tricks resulting in unfair evaluations and comparisons. What is worse, these tricks may overwhelm their proposed designs in performance, even leading to wrong conclusions. To address this issue, we build a module-designed codebase and formulate unified training standards for the community. Furthermore, we also design an error diagnosis toolbox to measure the detailed characterization of detection models. Using these tools, we analyze current methods in-depth under varying settings and provide discussions for some open questions, e.g., discrepancies in conclusions on KITTI-3D and nuScenes datasets, which have led to different dominant methods for these datasets. We hope that this work will facilitate future research in image-based 3D object detection. Our codes will be released at \url{https://github.com/OpenGVLab/3dodi}
翻译:本文构建了一个模块化设计的代码库,制定了强鲁棒性的训练策略,设计了误差诊断工具箱,并对当前基于图像的3D目标检测方法进行了讨论。与已高度成熟的2D目标检测等任务不同,基于图像的3D目标检测领域仍处于发展阶段,各类方法常采用不同的训练策略与技巧,导致评估与比较存在不公平性。更严重的是,这些技巧可能掩盖其提出的方法设计的真实性能,甚至得出错误结论。为解决这一问题,我们构建了一个模块化设计的代码库,并制定了统一的训练标准。此外,我们还设计了一个误差诊断工具箱,用于量化检测模型的详细特性。利用这些工具,我们深入分析了不同设置下的现有方法,并就一些开放问题展开讨论,例如KITTI-3D与nuScenes数据集上的结论差异——这些差异导致了不同的主流方法在这些数据集上的主导地位。我们期待本研究能推动基于图像的3D目标检测领域的未来发展。代码将发布于 \url{https://github.com/OpenGVLab/3dodi}