Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset

Arthur Zhang,Chaitanya Eranki,Christina Zhang,Ji-Hwan Park,Raymond Hong,Pranav Kalyani,Lochana Kalyanaraman,Arsh Gamare,Arnav Bagad,Maria Esteva,Joydeep Biswas

from arxiv, 19 pages, 18 figures, 12 tables. Website: https://amrl.cs.utexas.edu/coda

We introduce the UT Campus Object Dataset (CODa), a mobile robot egocentric perception dataset collected on the University of Texas Austin Campus. Our dataset contains 8.5 hours of multimodal sensor data: synchronized 3D point clouds and stereo RGB video from a 128-channel 3D LiDAR and two 1.25MP RGB cameras at 10 fps; RGB-D videos from an additional 0.5MP sensor at 7 fps, and a 9-DOF IMU sensor at 40 Hz. We provide 58 minutes of ground-truth annotations containing 1.3 million 3D bounding boxes with instance IDs for 53 semantic classes, 5000 frames of 3D semantic annotations for urban terrain, and pseudo-ground truth localization. We repeatedly traverse identical geographic locations for a wide range of indoor and outdoor areas, weather conditions, and times of the day. Using CODa, we empirically demonstrate that: 1) 3D object detection performance in urban settings is significantly higher when trained using CODa compared to existing datasets even when employing state-of-the-art domain adaptation approaches, 2) sensor-specific fine-tuning improves 3D object detection accuracy and 3) pretraining on CODa improves cross-dataset 3D object detection performance in urban settings compared to pretraining on AV datasets. Using our dataset and annotations, we release benchmarks for 3D object detection and 3D semantic segmentation using established metrics. In the future, the CODa benchmark will include additional tasks like unsupervised object discovery and re-identification. We publicly release CODa on the Texas Data Repository, pre-trained models, dataset development package, and interactive dataset viewer. We expect CODa to be a valuable dataset for research in egocentric 3D perception and planning for autonomous navigation in urban environments.

翻译：我们介绍了UT校园物体数据集（CODa），这是一个在德克萨斯大学奥斯汀校区采集的移动机器人自我中心感知数据集。该数据集包含8.5小时的多模态传感器数据：来自128通道3D激光雷达和两台1.25MP RGB摄像头的同步3D点云及立体RGB视频（10帧/秒）；额外0.5MP传感器提供的RGB-D视频（7帧/秒）；以及40Hz的9自由度IMU传感器数据。我们提供了58分钟的标注真值，包含53个语义类别、1.3百万个带实例ID的3D边界框，5000帧城市场景3D语义标注，以及伪真值定位信息。我们重复遍历相同的地理位置，覆盖广泛室内外区域、天气条件和一天中的不同时段。利用CODa，我们通过实验证明：1) 与现有数据集相比，即使采用最先进的领域自适应方法，基于CODa训练的城市场景3D目标检测性能显著更高；2) 针对特定传感器的微调可提升3D目标检测精度；3) 相较于在自动驾驶汽车数据集上预训练，基于CODa的预训练能提升跨数据集城市场景3D目标检测性能。基于该数据集与标注，我们发布了采用既定指标的3D目标检测与3D语义分割基准测试。未来，CODa基准测试将纳入无监督目标发现与重识别等额外任务。我们在德州数据存储库中公开了CODa、预训练模型、数据集开发包及交互式数据集查看器。我们期望CODa成为城市环境自主导航中自我中心3D感知与规划研究的重要数据集。