WildScenes: A Benchmark for 2D and 3D Semantic Segmentation in Large-scale Natural Environments

Kavisha Vidanapathirana,Joshua Knights,Stephen Hausler,Mark Cox,Milad Ramezani,Jason Jooste,Ethan Griffiths,Shaheer Mohamed,Sridha Sridharan,Clinton Fookes,Peyman Moghadam

from arxiv, Accepted in the The International Journal of Robotics Research (IJRR)

Recent progress in semantic scene understanding has primarily been enabled by the availability of semantically annotated bi-modal (camera and LiDAR) datasets in urban environments. However, such annotated datasets are also needed for natural, unstructured environments to enable semantic perception for applications, including conservation, search and rescue, environment monitoring, and agricultural automation. Therefore, we introduce $WildScenes$, a bi-modal benchmark dataset consisting of multiple large-scale, sequential traversals in natural environments, including semantic annotations in high-resolution 2D images and dense 3D LiDAR point clouds, and accurate 6-DoF pose information. The data is (1) trajectory-centric with accurate localization and globally aligned point clouds, (2) calibrated and synchronized to support bi-modal training and inference, and (3) containing different natural environments over 6 months to support research on domain adaptation. Our 3D semantic labels are obtained via an efficient, automated process that transfers the human-annotated 2D labels from multiple views into 3D point cloud sequences, thus circumventing the need for expensive and time-consuming human annotation in 3D. We introduce benchmarks on 2D and 3D semantic segmentation and evaluate a variety of recent deep-learning techniques to demonstrate the challenges in semantic segmentation in natural environments. We propose train-val-test splits for standard benchmarks as well as domain adaptation benchmarks and utilize an automated split generation technique to ensure the balance of class label distributions. The $WildScenes$ benchmark webpage is https://csiro-robotics.github.io/WildScenes, and the data is publicly available at https://data.csiro.au/collection/csiro:61541 .

翻译：近年来，语义场景理解领域的进展主要得益于城市环境中带有语义标注的双模态（相机与激光雷达）数据集的可用性。然而，为了支持包括生态保护、搜救行动、环境监测和农业自动化等应用中的语义感知能力，同样需要针对自然、非结构化环境的标注数据集。为此，我们推出$WildScenes$——一个双模态基准数据集，包含在自然环境中采集的多个大规模连续遍历序列，提供高分辨率二维图像与稠密三维激光雷达点云的语义标注，以及精确的六自由度位姿信息。该数据集具有以下特点：（1）以轨迹为中心，具备精确定位与全局对齐的点云；（2）经过校准与同步，支持双模态训练与推理；（3）涵盖超过6个月期间的不同自然环境，可用于域适应研究。我们的三维语义标签通过一种高效的自动化流程获得，该流程将人工标注的二维标签从多视角转移至三维点云序列，从而避免了昂贵且耗时的人工三维标注。我们建立了二维与三维语义分割的基准测试，并评估了多种近期深度学习技术，以揭示自然环境中语义分割所面临的挑战。我们为标准基准测试及域适应基准测试提出了训练-验证-测试划分方案，并采用自动化划分生成技术以确保类别标签分布的均衡性。$WildScenes$基准数据集网页为 https://csiro-robotics.github.io/WildScenes，数据公开发布于 https://data.csiro.au/collection/csiro:61541。