Transformer architectures have exhibited promising performance in various autonomous driving applications in recent years. On the other hand, its dedicated hardware acceleration on portable computational platforms has become the next critical step for practical deployment in real autonomous vehicles. This survey paper provides a comprehensive overview, benchmark, and analysis of Transformer-based models specifically tailored for autonomous driving tasks such as lane detection, segmentation, tracking, planning, and decision-making. We review different architectures for organizing Transformer inputs and outputs, such as encoder-decoder and encoder-only structures, and explore their respective advantages and disadvantages. Furthermore, we discuss Transformer-related operators and their hardware acceleration schemes in depth, taking into account key factors such as quantization and runtime. We specifically illustrate the operator level comparison between layers from convolutional neural network, Swin-Transformer, and Transformer with 4D encoder. The paper also highlights the challenges, trends, and current insights in Transformer-based models, addressing their hardware deployment and acceleration issues within the context of long-term autonomous driving applications.
翻译:Transformer架构近年来在多种自动驾驶应用中展现出优异的性能。与此同时,其在便携计算平台上的专用硬件加速已成为自动驾驶车辆实际部署的关键下一步。本文针对车道检测、分割、跟踪、规划和决策等自动驾驶任务,对基于Transformer的模型进行了全面综述、基准测试和分析。我们回顾了用于组织Transformer输入输出的不同架构(如编码器-解码器和仅编码器结构),并探讨了各自的优缺点。此外,我们深入讨论了Transformer相关算子及其硬件加速方案,考虑了量化和运行时间等关键因素,特别展示了卷积神经网络、Swin-Transformer和带4D编码器的Transformer在层间算子级别的对比。本文还强调了基于Transformer模型在长期自动驾驶应用中的挑战、趋势和当前见解,探讨了其硬件部署和加速问题。