In this work, we build a modular-designed codebase, formulate strong training recipes, design an error diagnosis toolbox, and discuss current methods for image-based 3D object detection. In particular, different from other highly mature tasks, e.g., 2D object detection, the community of image-based 3D object detection is still evolving, where methods often adopt different training recipes and tricks resulting in unfair evaluations and comparisons. What is worse, these tricks may overwhelm their proposed designs in performance, even leading to wrong conclusions. To address this issue, we build a module-designed codebase and formulate unified training standards for the community. Furthermore, we also design an error diagnosis toolbox to measure the detailed characterization of detection models. Using these tools, we analyze current methods in-depth under varying settings and provide discussions for some open questions, e.g., discrepancies in conclusions on KITTI-3D and nuScenes datasets, which have led to different dominant methods for these datasets. We hope that this work will facilitate future research in image-based 3D object detection. Our codes will be released at \url{https://github.com/OpenGVLab/3dodi}
翻译:在本工作中,我们构建了模块化设计的代码库,制定了强健的训练策略,设计了误差诊断工具箱,并对当前基于图像的三维目标检测方法进行了深入探讨。与诸如二维目标检测等高度成熟的任务不同,基于图像的三维目标检测领域仍处于快速发展阶段,现有方法常采用不同的训练策略与技巧,导致评估结果不公与比较失当。更严重的是,这些技巧在性能上可能掩盖其核心设计贡献,甚至引发错误结论。为解决该问题,我们建立了模块化代码库并制定了统一训练标准。此外,我们还设计了误差诊断工具箱,用以精确刻画检测模型的细粒度特征。借助这些工具,我们深入分析了当前方法在不同设置下的表现,并对若干开放性问题展开讨论(如KITTI-3D与nuScenes数据集结论差异导致的主导方法分歧)。我们期望本工作能推动基于图像的三维目标检测领域的发展。相关代码将发布于 \url{https://github.com/OpenGVLab/3dodi}。