Current deep neural networks (DNNs) for autonomous driving computer vision are typically trained on specific datasets that only involve a single type of data and urban scenes. Consequently, these models struggle to handle new objects, noise, nighttime conditions, and diverse scenarios, which is essential for safety-critical applications. Despite ongoing efforts to enhance the resilience of computer vision DNNs, progress has been sluggish, partly due to the absence of benchmarks featuring multiple modalities. We introduce a novel and versatile dataset named InfraParis that supports multiple tasks across three modalities: RGB, depth, and infrared. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation. More visualizations and the download link for InfraParis are available at \href{https://ensta-u2is.github.io/infraParis/}{https://ensta-u2is.github.io/infraParis/}.
翻译:当前用于自动驾驶计算机视觉的深度神经网络通常针对仅包含单一数据类型和城市场景的特定数据集进行训练。因此,这些模型难以应对对安全关键应用至关重要的新物体、噪声、夜间条件及多样化场景。尽管持续努力提升计算机视觉深度神经网络的鲁棒性,但因缺乏包含多模态的基准测试集,进展一直缓慢。我们提出一种名为InfraParis的新型多功能数据集,该数据集支持跨RGB、深度和红外三种模态的多种任务。我们评估了多种最先进基线技术,涵盖语义分割、目标检测和深度估计任务的模型。更多可视化结果及InfraParis下载链接请访问\href{https://ensta-u2is.github.io/infraParis/}{https://ensta-u2is.github.io/infraParis/}。