This research paper addresses the challenges associated with traffic sign detection in self-driving vehicles and driver assistance systems. The development of reliable and highly accurate algorithms is crucial for the widespread adoption of traffic sign recognition and detection (TSRD) in diverse real-life scenarios. However, this task is complicated by suboptimal traffic images affected by factors such as camera movement, adverse weather conditions, and inadequate lighting. This study specifically focuses on traffic sign detection methods and introduces the application of the Transformer model, particularly the Vision Transformer variants, to tackle this task. The Transformer's attention mechanism, originally designed for natural language processing, offers improved parallel efficiency. Vision Transformers have demonstrated success in various domains, including autonomous driving, object detection, healthcare, and defense-related applications. To enhance the efficiency of the Transformer model, the research proposes a novel strategy that integrates a locality inductive bias and a transformer module. This includes the introduction of the Efficient Convolution Block and the Local Transformer Block, which effectively capture short-term and long-term dependency information, thereby improving both detection speed and accuracy. Experimental evaluations demonstrate the significant advancements achieved by this approach, particularly when applied to the GTSDB dataset.
翻译:本研究论文探讨了自动驾驶车辆及驾驶辅助系统中交通标志检测所面临的挑战。开发可靠且高精度的算法对于交通标志识别与检测(TSRD)在多样化真实场景中的广泛应用至关重要。然而,受相机运动、恶劣天气条件及光照不足等因素影响导致的次优交通图像,使得该项任务复杂化。本研究聚焦于交通标志检测方法,并引入Transformer模型(尤其是Vision Transformer变体)的应用以解决该任务。Transformer的注意力机制最初为自然语言处理而设计,具有更优的并行效率。Vision Transformer已在自动驾驶、目标检测、医疗健康及国防相关应用等多个领域展现出成功应用。为提升Transformer模型的效率,本研究提出了一种整合局部性归纳偏置与Transformer模块的新策略。该策略引入了高效卷积模块(Efficient Convolution Block)和局部Transformer模块(Local Transformer Block),能够有效捕获短期与长期依赖信息,从而同时提升检测速度与精度。实验评估表明,该方法在GTSDB数据集上取得了显著进展。