Current deep neural networks (DNNs) for autonomous driving computer vision are typically trained on specific datasets that only involve a single type of data and urban scenes. Consequently, these models struggle to handle new objects, noise, nighttime conditions, and diverse scenarios, which is essential for safety-critical applications. Despite ongoing efforts to enhance the resilience of computer vision DNNs, progress has been sluggish, partly due to the absence of benchmarks featuring multiple modalities. We introduce a novel and versatile dataset named InfraParis that supports multiple tasks across three modalities: RGB, depth, and infrared. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.
翻译:当前用于自动驾驶计算机视觉的深度神经网络通常是在仅涉及单一数据类型和城市场景的特定数据集上训练的。因此,这些模型难以处理新物体、噪声、夜间条件以及多样化的场景,而这对于安全关键型应用至关重要。尽管持续致力于增强计算机视觉深度神经网络的鲁棒性,但进展缓慢,部分原因在于缺乏包含多种模态的基准测试。我们提出了一个名为InfraParis的新型多功能数据集,该数据集支持跨三种模态(RGB、深度和红外)的多个任务。我们评估了多种最先进的基线技术,涵盖了语义分割、目标检测和深度估计任务所需的模型。