This Paper proposes a novel Transformer-based end-to-end autonomous driving model named Detrive. This model solves the problem that the past end-to-end models cannot detect the position and size of traffic participants. Detrive uses an end-to-end transformer based detection model as its perception module; a multi-layer perceptron as its feature fusion network; a recurrent neural network with gate recurrent unit for path planning; and two controllers for the vehicle's forward speed and turning angle. The model is trained with an on-line imitation learning method. In order to obtain a better training set, a reinforcement learning agent that can directly obtain a ground truth bird's-eye view map from the Carla simulator as a perceptual output, is used as teacher for the imitation learning. The trained model is tested on the Carla's autonomous driving benchmark. The results show that the Transformer detector based end-to-end model has obvious advantages in dynamic obstacle avoidance compared with the traditional classifier based end-to-end model.
翻译:本文提出一种名为Detrive的新型基于Transformer的端到端自动驾驶模型。该模型解决了以往端到端模型无法检测交通参与者位置与尺寸的问题。Detrive采用基于Transformer的端到端检测模型作为感知模块,多层感知机作为特征融合网络,基于门控循环单元的循环神经网络进行路径规划,以及两个控制器分别控制车辆的前进速度与转向角度。该模型基于在线模仿学习方法进行训练。为获取更优训练集,本文使用一个能从Carla模拟器中直接获取地面实况鸟瞰图作为感知输出的强化学习智能体,作为模仿学习的教师模型。训练后的模型在Carla自动驾驶基准测试中进行了评估。结果表明,与传统基于分类器的端到端模型相比,基于Transformer检测器的端到端模型在动态障碍物规避方面具有显著优势。