Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in challenging conditions. To address this, we introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty. MUSES includes synchronized multimodal recordings with 2D panoptic annotations for 2500 images captured under diverse weather and illumination. The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor. Our new two-stage panoptic annotation protocol captures both class-level and instance-level uncertainty in the ground truth and enables the novel task of uncertainty-aware panoptic segmentation we introduce, along with standard semantic and panoptic segmentation. MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions, and it opens new avenues for research in multimodal and uncertainty-aware dense semantic perception. Our dataset and benchmark are publicly available at https://muses.vision.ee.ethz.ch.
翻译:实现自动驾驶车辆的五级自动化需要一个鲁棒的语义视觉感知系统,能够解析不同条件下多种传感器的数据。然而,现有的语义感知数据集通常缺乏自动驾驶车辆中常用的重要非摄像头模态,或者未能利用这些模态来辅助和改进挑战性条件下的语义标注。为此,我们提出了MUSES,这是一个针对不确定性增强的恶劣驾驶条件的多传感器语义感知数据集。MUSES包含了2500张在不同天气和光照条件下捕获的图像的同步多模态记录,并配有二维全景标注。该数据集集成了帧摄像头、激光雷达、雷达、事件摄像头以及IMU/GNSS传感器。我们提出的新型两阶段全景标注协议捕获了真实标注中的类别级和实例级不确定性,并支持我们引入的新颖任务——不确定性感知的全景分割,以及标准的语义分割和全景分割。MUSES被证明在多样化视觉条件下既能有效用于模型训练,也对模型评估构成挑战,并为多模态和不确定性感知的密集语义感知研究开辟了新途径。我们的数据集和基准测试公开于 https://muses.vision.ee.ethz.ch。