SUPS: A Simulated Underground Parking Scenario Dataset for Autonomous Driving

Automatic underground parking has attracted considerable attention as the scope of autonomous driving expands. The auto-vehicle is supposed to obtain the environmental information, track its location, and build a reliable map of the scenario. Mainstream solutions consist of well-trained neural networks and simultaneous localization and mapping (SLAM) methods, which need numerous carefully labeled images and multiple sensor estimations. However, there is a lack of underground parking scenario datasets with multiple sensors and well-labeled images that support both SLAM tasks and perception tasks, such as semantic segmentation and parking slot detection. In this paper, we present SUPS, a simulated dataset for underground automatic parking, which supports multiple tasks with multiple sensors and multiple semantic labels aligned with successive images according to timestamps. We intend to cover the defect of existing datasets with the variability of environments and the diversity and accessibility of sensors in the virtual scene. Specifically, the dataset records frames from four surrounding fisheye cameras, two forward pinhole cameras, a depth camera, and data from LiDAR, inertial measurement unit (IMU), GNSS. Pixel-level semantic labels are provided for objects, especially ground signs such as arrows, parking lines, lanes, and speed bumps. Perception, 3D reconstruction, depth estimation, and SLAM, and other relative tasks are supported by our dataset. We also evaluate the state-of-the-art SLAM algorithms and perception models on our dataset. Finally, we open source our virtual 3D scene built based on Unity Engine and release our dataset at https://github.com/jarvishou829/SUPS.

翻译：随着自动驾驶范围的扩展，自动地下停车技术引起了广泛关注。自动车辆需要获取环境信息、追踪自身位置并构建可靠的场景地图。主流解决方案包括训练有素的神经网络以及同步定位与地图构建（SLAM）方法，这些方法需要大量精细标注的图像和多种传感器估计数据。然而，目前缺乏同时支持SLAM任务和感知任务（如语义分割和停车位检测）的多传感器、高质量标注图像的地下停车场场景数据集。本文提出了SUPS，一个面向地下自动停车的模拟数据集，该数据集通过时间戳对齐的连续图像，提供了多传感器、多语义标签的多任务支持。我们旨在通过虚拟场景中环境的可变性、传感器的多样性及可访问性来弥补现有数据集的缺陷。具体而言，该数据集记录了来自四个环视鱼眼相机、两个前向针孔相机、一个深度相机以及激光雷达（LiDAR）、惯性测量单元（IMU）、全球导航卫星系统（GNSS）的数据。针对物体（尤其是地面标志，如箭头、停车线、车道线及减速带）提供了像素级语义标签。我们的数据集支持感知、三维重建、深度估计和SLAM等相关任务。我们还基于该数据集评估了当前最先进的SLAM算法及感知模型。最后，我们开源了基于Unity引擎构建的虚拟三维场景，并将数据集发布至https://github.com/jarvishou829/SUPS。