Perception plays a crucial role in various robot applications. However, existing well-annotated datasets are biased towards autonomous driving scenarios, while unlabelled SLAM datasets are quickly over-fitted, and often lack environment and domain variations. To expand the frontier of these fields, we introduce a comprehensive dataset named MCD (Multi-Campus Dataset), featuring a wide range of sensing modalities, high-accuracy ground truth, and diverse challenging environments across three Eurasian university campuses. MCD comprises both CCS (Classical Cylindrical Spinning) and NRE (Non-Repetitive Epicyclic) lidars, high-quality IMUs (Inertial Measurement Units), cameras, and UWB (Ultra-WideBand) sensors. Furthermore, in a pioneering effort, we introduce semantic annotations of 29 classes over 59k sparse NRE lidar scans across three domains, thus providing a novel challenge to existing semantic segmentation research upon this largely unexplored lidar modality. Finally, we propose, for the first time to the best of our knowledge, continuous-time ground truth based on optimization-based registration of lidar-inertial data on large survey-grade prior maps, which are also publicly released, each several times the size of existing ones. We conduct a rigorous evaluation of numerous state-of-the-art algorithms on MCD, report their performance, and highlight the challenges awaiting solutions from the research community.
翻译:感知在各类机器人应用中发挥着关键作用。然而,现有高质量的标注数据集偏向于自动驾驶场景,无标注的SLAM数据集则易出现过拟合现象,且普遍缺乏环境与领域多样性。为拓展该领域的前沿研究,我们提出了一个综合性数据集MCD(多校区数据集),其涵盖跨欧亚三所大学校区的多种传感模态、高精度真值及多样化挑战性环境。MCD包含传统圆柱旋转式激光雷达(CCS)与非重复外摆线式激光雷达(NRE)、高精度惯性测量单元(IMU)、相机及超宽带(UWB)传感器。此外,作为开创性工作,我们在三个领域的59k稀疏NRE激光雷达扫描数据上引入了29类语义标注,为现有语义分割研究在尚未充分探索的激光雷达模态下提供了全新挑战。最后,据我们所知,我们首次提出了基于优化方法的激光雷达-惯性数据在大规模高精度先验地图(其规模均为现有地图的数倍,且已公开)上的连续时间真值。我们基于MCD对多种前沿算法进行了严格评估,报告其性能,并指出了亟待研究界解决的挑战。