Towards Robust Robot 3D Perception in Urban Environments: The UT Campus Object Dataset

Arthur Zhang,Chaitanya Eranki,Christina Zhang,Ji-Hwan Park,Raymond Hong,Pranav Kalyani,Lochana Kalyanaraman,Arsh Gamare,Arnav Bagad,Maria Esteva,Joydeep Biswas

from arxiv, 19 pages, 18 figures, 12 tables

We introduce the UT Campus Object Dataset (CODa), a mobile robot egocentric perception dataset collected on the University of Texas Austin Campus. Our dataset contains 8.5 hours of multimodal sensor data: synchronized 3D point clouds and stereo RGB video from a 128-channel 3D LiDAR and two 1.25MP RGB cameras at 10 fps; RGB-D videos from an additional 0.5MP sensor at 7 fps, and a 9-DOF IMU sensor at 40 Hz. We provide 58 minutes of ground-truth annotations containing 1.3 million 3D bounding boxes with instance IDs for 53 semantic classes, 5000 frames of 3D semantic annotations for urban terrain, and pseudo-ground truth localization. We repeatedly traverse identical geographic locations for a wide range of indoor and outdoor areas, weather conditions, and times of the day. Using CODa, we empirically demonstrate that: 1) 3D object detection performance in urban settings is significantly higher when trained using CODa compared to existing datasets even when employing state-of-the-art domain adaptation approaches, 2) sensor-specific fine-tuning improves 3D object detection accuracy and 3) pretraining on CODa improves cross-dataset 3D object detection performance in urban settings compared to pretraining on AV datasets. Using our dataset and annotations, we release benchmarks for 3D object detection and 3D semantic segmentation using established metrics. In the future, the CODa benchmark will include additional tasks like unsupervised object discovery and re-identification. We publicly release CODa on the Texas Data Repository, pre-trained models, dataset development package, and interactive dataset viewer on our website at https://amrl.cs.utexas.edu/coda. We expect CODa to be a valuable dataset for research in egocentric 3D perception and planning for autonomous navigation in urban environments.

翻译：我们提出UT校园物体数据集（CODa），这是一个在德克萨斯大学奥斯汀校区采集的移动机器人自我中心感知数据集。该数据集包含8.5小时的多模态传感器数据：来自128通道3D激光雷达和两个1.25MP RGB相机以10帧/秒同步采集的3D点云与立体RGB视频；额外一个0.5MP传感器以7帧/秒采集的RGB-D视频；以及以40Hz采样的9自由度惯性测量单元数据。我们提供了58分钟的真值标注，包含53个语义类别的130万个带实例ID的3D边界框、5000帧城市地形3D语义标注，以及伪真值定位。我们对同一地理区域进行多次重复遍历，涵盖广泛的室内外场景、天气条件和一天中的不同时段。利用CODa，我们实证证明：1）即使采用最先进的域适应方法，使用CODa训练的城市环境3D目标检测性能显著高于现有数据集；2）特定传感器微调可提升3D目标检测精度；3）相较于在自动驾驶数据集上预训练，在CODa上预训练能改善跨数据集城市环境3D目标检测性能。基于本数据集与标注，我们使用既定指标发布了3D目标检测与3D语义分割的基准测试。未来，CODa基准测试将包含无监督目标发现与重识别等附加任务。我们在德州数据仓库公开发布CODa数据集、预训练模型、数据集开发工具包及交互式数据集查看器，访问地址为https://amrl.cs.utexas.edu/coda。我们期待CODa成为城市环境中自主导航的自我中心3D感知与规划研究领域的重要数据集。