As self-driving technology advances toward widespread adoption, determining safe operational thresholds across varying environmental conditions becomes critical for public safety. This paper proposes a method for evaluating the robustness of object detection ML models in autonomous vehicles under adverse weather conditions. It employs data augmentation operators to generate synthetic data that simulates different severance degrees of the adverse operation conditions at progressive intensity levels to find the lowest intensity of the adverse conditions at which the object detection model fails. The robustness of the object detection model is measured by the average first failure coefficients (AFFC) over the input images in the benchmark. The paper reports an experiment with four object detection models: YOLOv5s, YOLOv11s, Faster R-CNN, and Detectron2, utilising seven data augmentation operators that simulate weather conditions fog, rain, and snow, and lighting conditions of dark, bright, flaring, and shadow. The experiment data show that the method is feasible, effective, and efficient to evaluate and compare the robustness of object detection models in various adverse operation conditions. In particular, the Faster R-CNN model achieved the highest robustness with an overall average AFFC of 71.9% over all seven adverse conditions, while YOLO variants showed the AFFC values of 43%. The method is also applied to assess the impact of model training that targets adverse operation conditions using synthetic data on model robustness. It is observed that such training can improve robustness in adverse conditions but may suffer from diminishing returns and forgetting phenomena (i.e., decline in robustness) if overtrained.
翻译:随着自动驾驶技术向广泛应用推进,确定不同环境条件下的安全运行阈值对公共安全至关重要。本文提出了一种评估自动驾驶车辆在恶劣天气条件下目标检测机器学习模型鲁棒性的方法。该方法采用数据增强算子生成合成数据,模拟渐进强度级别下恶劣运行条件的不同严重程度,以找出目标检测模型失效时的最低恶劣条件强度。目标检测模型的鲁棒性通过基准测试输入图像的平均首次失效系数(AFFC)来衡量。本文报告了一项使用四种目标检测模型(YOLOv5s、YOLOv11s、Faster R-CNN和Detectron2)的实验,采用七种数据增强算子模拟雾、雨、雪等天气条件以及暗光、强光、眩光和阴影等光照条件。实验数据表明,该方法在评估和比较各种恶劣运行条件下目标检测模型的鲁棒性方面具有可行性、有效性和高效性。具体而言,Faster R-CNN模型在所有七种恶劣条件下取得了最高的鲁棒性,整体平均AFFC达到71.9%,而YOLO变体的AFFC值为43%。该方法还用于评估针对恶劣运行条件使用合成数据进行模型训练对鲁棒性的影响。研究发现,此类训练能提升恶劣条件下的鲁棒性,但若训练过度可能遭遇收益递减和遗忘现象(即鲁棒性下降)。