The rapid advancement of deep learning has intensified the need for comprehensive data for use by autonomous driving algorithms. High-quality datasets are crucial for the development of effective data-driven autonomous driving solutions. Next-generation autonomous driving datasets must be multimodal, incorporating data from advanced sensors that feature extensive data coverage, detailed annotations, and diverse scene representation. To address this need, we present OmniHD-Scenes, a large-scale multimodal dataset that provides comprehensive omnidirectional high-definition data. The OmniHD-Scenes dataset combines data from 128-beam LiDAR, six cameras, and six 4D imaging radar systems to achieve full environmental perception. The dataset comprises 1501 clips, each approximately 30-s long, totaling more than 450K synchronized frames and more than 5.85 million synchronized sensor data points. We also propose a novel 4D annotation pipeline. To date, we have annotated 200 clips with more than 514K precise 3D bounding boxes. These clips also include semantic segmentation annotations for static scene elements. Additionally, we introduce a novel automated pipeline for generation of the dense occupancy ground truth, which effectively leverages information from non-key frames. Alongside the proposed dataset, we establish comprehensive evaluation metrics, baseline models, and benchmarks for 3D detection and semantic occupancy prediction. These benchmarks utilize surround-view cameras and 4D imaging radar to explore cost-effective sensor solutions for autonomous driving applications. Extensive experiments demonstrate the effectiveness of our low-cost sensor configuration and its robustness under adverse conditions. Data will be released at https://www.2077ai.com/OmniHD-Scenes.
翻译:深度学习的快速发展加剧了自动驾驶算法对全面数据的需求。高质量数据集对于开发有效的数据驱动自动驾驶解决方案至关重要。新一代自动驾驶数据集必须是多模态的,需整合来自先进传感器的数据,并具备广泛的数据覆盖、详细的标注以及多样化的场景表征。为满足这一需求,我们提出了OmniHD-Scenes,一个提供全方位高清数据的大规模多模态数据集。OmniHD-Scenes数据集融合了128线激光雷达、六个摄像头和六个4D成像雷达系统的数据,以实现全面的环境感知。该数据集包含1501个片段,每个片段时长约30秒,总计超过45万帧同步图像及超过585万个同步传感器数据点。我们还提出了一种新颖的4D标注流程。截至目前,我们已标注了200个片段,包含超过51.4万个精确的3D边界框。这些片段还包含静态场景元素的语义分割标注。此外,我们引入了一种新颖的自动化流程,用于生成密集占据栅格真值,该流程有效利用了非关键帧的信息。除了提出的数据集,我们还建立了全面的评估指标、基线模型以及针对3D检测和语义占据预测的基准。这些基准利用环视摄像头和4D成像雷达,探索用于自动驾驶应用的具有成本效益的传感器解决方案。大量实验证明了我们低成本传感器配置的有效性及其在恶劣条件下的鲁棒性。数据将在 https://www.2077ai.com/OmniHD-Scenes 发布。