As a part of the perception results of intelligent driving systems, static object detection (SOD) in 3D space provides crucial cues for driving environment understanding. With the rapid deployment of deep neural networks for SOD tasks, the demand for high-quality training samples soars. The traditional, also reliable, way is manual labeling over the dense LiDAR point clouds and reference images. Though most public driving datasets adopt this strategy to provide SOD ground truth (GT), it is still expensive (requires LiDAR scanners) and low-efficient (time-consuming and unscalable) in practice. This paper introduces VRSO, a visual-centric approach for static object annotation. VRSO is distinguished in low cost, high efficiency, and high quality: (1) It recovers static objects in 3D space with only camera images as input, and (2) manual labeling is barely involved since GT for SOD tasks is generated based on an automatic reconstruction and annotation pipeline. (3) Experiments on the Waymo Open Dataset show that the mean reprojection error from VRSO annotation is only 2.6 pixels, around four times lower than the Waymo labeling (10.6 pixels). Source code is available at: https://github.com/CaiYingFeng/VRSO.
翻译:作为智能驾驶系统感知结果的一部分,三维空间中的静态物体检测(SOD)为驾驶环境理解提供了关键线索。随着用于SOD任务的深度神经网络快速部署,对高质量训练样本的需求急剧上升。传统且可靠的方式是在密集的LiDAR点云和参考图像上进行人工标注。尽管大多数公开驾驶数据集采用此策略提供SOD真实标注(GT),但实践中该方法成本高昂(需LiDAR扫描仪)且效率低下(耗时且不可扩展)。本文提出VRSO——一种面向静态物体标注的视觉中心方法。VRSO具有低成本、高效率和高品质的显著优势:(1)仅以相机图像为输入即可恢复三维空间中的静态物体;(2)由于基于自动重建与标注流水线生成SOD任务的GT,几乎无需人工标注;(3)在Waymo开放数据集上的实验表明,VRSO标注的平均重投影误差仅为2.6像素,比Waymo标注结果(10.6像素)低约四倍。源代码已开源:https://github.com/CaiYingFeng/VRSO。