The recent development of online static map element (a.k.a. HD map) construction algorithms has raised a vast demand for data with ground truth annotations. However, available public datasets currently cannot provide high-quality training data regarding consistency and accuracy. For instance, the manual labelled (low efficiency) nuScenes still contains misalignment and inconsistency between the HD maps and images (e.g., around 8.03 pixels reprojection error on average). To this end, we present CAMAv2: a vision-centric approach for Consistent and Accurate Map Annotation. Without LiDAR inputs, our proposed framework can still generate high-quality 3D annotations of static map elements. Specifically, the annotation can achieve high reprojection accuracy across all surrounding cameras and is spatial-temporal consistent across the whole sequence. We apply our proposed framework to the popular nuScenes dataset to provide efficient and highly accurate annotations. Compared with the original nuScenes static map element, our CAMAv2 annotations achieve lower reprojection errors (e.g., 4.96 vs. 8.03 pixels). Models trained with annotations from CAMAv2 also achieve lower reprojection errors (e.g., 5.62 vs. 8.43 pixels).
翻译:近期在线静态地图元素(又称高精地图)构建算法的发展,催生了大量对带有真值标注数据的需求。然而,现有公开数据集目前无法提供在一致性和准确性方面的高质量训练数据。例如,人工标注(效率低下)的 nuScenes 数据集仍存在高精地图与图像之间的错位和不一致问题(例如,平均重投影误差约为 8.03 像素)。为此,我们提出了 CAMAv2:一种用于实现一致且准确的地图标注的以视觉为中心的方法。无需激光雷达输入,我们提出的框架仍能生成高质量的静态地图元素三维标注。具体而言,该标注能够在所有环视相机上实现高重投影精度,并且在整个序列中保持时空一致性。我们将提出的框架应用于流行的 nuScenes 数据集,以提供高效且高精度的标注。与原始的 nuScenes 静态地图元素相比,我们的 CAMAv2 标注实现了更低的重投影误差(例如,4.96 像素对比 8.03 像素)。使用 CAMAv2 标注训练的模型也实现了更低的重投影误差(例如,5.62 像素对比 8.43 像素)。