A crucial yet under-appreciated prerequisite in machine learning solutions for real-applications is data annotation: human annotators are hired to manually label data according to detailed, expert-crafted guidelines. This is often a laborious, tedious, and costly process. To study methods for facilitating data annotation, we introduce a new benchmark AnnoGuide: Auto-Annotation from Annotation Guidelines. It aims to evaluate automated methods for data annotation directly from expert-defined annotation guidelines, eliminating the need for manual labeling. As a case study, we repurpose the well-established nuScenes dataset, commonly used in autonomous driving research, which provides comprehensive annotation guidelines for labeling LiDAR point clouds with 3D cuboids across 18 object classes. These guidelines include a few visual examples and textual descriptions, but no labeled 3D cuboids in LiDAR data, making this a novel task of multi-modal few-shot 3D detection without 3D annotations. The advances of powerful foundation models (FMs) make AnnoGuide especially timely, as FMs offer promising tools to tackle its challenges. We employ a conceptually straightforward pipeline that (1) utilizes open-source FMs for object detection and segmentation in RGB images, (2) projects 2D detections into 3D using known camera poses, and (3) clusters LiDAR points within the frustum of each 2D detection to generate a 3D cuboid. Starting with a non-learned solution that leverages off-the-shelf FMs, we progressively refine key components and achieve significant performance improvements, boosting 3D detection mAP from 12.1 to 21.9! Nevertheless, our results highlight that AnnoGuide remains an open and challenging problem, underscoring the urgent need for developing LiDAR-based FMs. We release our code and models at GitHub: https://annoguide.github.io/annoguide3Dbenchmark
翻译:在面向实际应用的机器学习解决方案中,一个至关重要却常被忽视的前提是数据标注:需要聘请人工标注员依据专家制定的详细指南手动标注数据。这一过程通常费力、枯燥且成本高昂。为研究促进数据标注的方法,我们提出了新的基准AnnoGuide:基于标注指南的自动标注。该基准旨在评估直接从专家定义的标注指南进行数据标注的自动化方法,从而消除人工标注的需求。作为案例研究,我们重新利用了自动驾驶研究中广泛使用的nuScenes数据集,该数据集为18个物体类别的LiDAR点云3D立方体标注提供了全面的标注指南。这些指南包含少量视觉示例和文本描述,但未提供LiDAR数据中带标注的3D立方体,这使得本任务成为无需3D标注的多模态少样本3D检测新课题。强大基础模型(FMs)的发展使AnnoGuide研究尤为及时,因为FMs为解决其挑战提供了有力工具。我们采用概念简洁的流程:(1)利用开源FMs进行RGB图像中的物体检测与分割,(2)通过已知相机位姿将2D检测结果投影至3D空间,(3)在每个2D检测视锥体内对LiDAR点进行聚类以生成3D立方体。从基于现成FMs的非学习解决方案出发,我们逐步优化关键组件并实现显著性能提升,将3D检测mAP从12.1提升至21.9!尽管如此,我们的结果表明AnnoGuide仍是开放且具有挑战性的问题,凸显了开发基于LiDAR的FMs的迫切需求。我们在GitHub发布代码与模型:https://annoguide.github.io/annoguide3Dbenchmark