Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning

from arxiv, 20 pages, 6 figures, accepted by the 103rd Transportation Research Board (TRB) Annual Meeting, under review by Transportation Research Record: Journal of the Transportation Research Board

The burgeoning navigation services using digital maps provide great convenience to drivers. Nevertheless, the presence of anomalies in lane rendering map images occasionally introduces potential hazards, as such anomalies can be misleading to human drivers and consequently contribute to unsafe driving conditions. In response to this concern and to accurately and effectively detect the anomalies, this paper transforms lane rendering image anomaly detection into a classification problem and proposes a four-phase pipeline consisting of data pre-processing, self-supervised pre-training with the masked image modeling (MiM) method, customized fine-tuning using cross-entropy based loss with label smoothing, and post-processing to tackle it leveraging state-of-the-art deep learning techniques, especially those involving Transformer models. Various experiments verify the effectiveness of the proposed pipeline. Results indicate that the proposed pipeline exhibits superior performance in lane rendering image anomaly detection, and notably, the self-supervised pre-training with MiM can greatly enhance the detection accuracy while significantly reducing the total training time. For instance, employing the Swin Transformer with Uniform Masking as self-supervised pretraining (Swin-Trans-UM) yielded a heightened accuracy at 94.77% and an improved Area Under The Curve (AUC) score of 0.9743 compared with the pure Swin Transformer without pre-training (Swin-Trans) with an accuracy of 94.01% and an AUC of 0.9498. The fine-tuning epochs were dramatically reduced to 41 from the original 280. In conclusion, the proposed pipeline, with its incorporation of self-supervised pre-training using MiM and other advanced deep learning techniques, emerges as a robust solution for enhancing the accuracy and efficiency of lane rendering image anomaly detection in digital navigation systems.

翻译：数字地图导航服务的蓬勃发展给驾驶员带来了极大便利。然而，车道渲染地图图像中偶尔出现的异常会误导人类驾驶员，进而引发危险驾驶条件，构成潜在安全隐患。针对这一问题，为精准高效检测异常，本文将车道渲染图像异常检测转化为分类问题，提出了一种四阶段流水线：数据预处理、基于掩码图像建模（MiM）的自监督预训练、采用带标签平滑的交叉熵损失函数进行定制微调，以及后处理，并利用先进深度学习技术（尤其是Transformer模型）加以实现。多项实验验证了该流水线的有效性。结果表明，所提流水线在车道渲染图像异常检测中表现出优越性能，其中基于MiM的自监督预训练可大幅提升检测精度，同时显著缩短总训练时间。例如，采用统一掩码Swin Transformer进行自监督预训练（Swin-Trans-UM）后，准确率达94.77%，曲线下面积（AUC）分数为0.9743，优于未预训练的纯Swin Transformer（Swin-Trans，准确率94.01%，AUC为0.9498）；微调轮次从原始280轮大幅缩减至41轮。综上，本流水线通过融合基于MiM的自监督预训练及其他先进深度学习技术，为数字导航系统中车道渲染图像异常检测的准确性与效率提升提供了稳健解决方案。