Autonomous driving has been an active area of research and development, with various strategies being explored for decision-making in autonomous vehicles. Rule-based systems, decision trees, Markov decision processes, and Bayesian networks have been some of the popular methods used to tackle the complexities of traffic conditions and avoid collisions. However, with the emergence of deep learning, many researchers have turned towards CNN-based methods to improve the performance of collision avoidance. Despite the promising results achieved by some CNN-based methods, the failure to establish correlations between sequential images often leads to more collisions. In this paper, we propose a CNN-based method that overcomes the limitation by establishing feature correlations between regions in sequential images using variants of attention. Our method combines the advantages of CNN in capturing regional features with a bi-directional LSTM to enhance the relationship between different local areas. Additionally, we use an encoder to improve computational efficiency. Our method takes "Bird's Eye View" graphs generated from camera and LiDAR sensors as input, simulates the position (x, y) and head offset angle (Yaw) to generate future trajectories. Experiment results demonstrate that our proposed method outperforms existing vision-based strategies, achieving an average of only 3.7 collisions per 1000 miles of driving distance on the L5kit test set. This significantly improves the success rate of collision avoidance and provides a promising solution for autonomous driving.
翻译:自动驾驶一直是研究与发展活跃的领域,各类策略被探索用于自动驾驶车辆的决策制定。基于规则的系统、决策树、马尔可夫决策过程及贝叶斯网络曾是应对交通状况复杂性并避免碰撞的常用方法。然而,随着深度学习的兴起,众多研究者转向基于CNN的方法以提升碰撞避免性能。尽管某些CNN方法已取得令人瞩目的成果,但未能建立序列图像间关联性常导致更多碰撞。本文提出一种基于CNN的方法,通过使用注意力机制的变体建立序列图像中区域间的特征关联以克服该局限。该方法结合CNN捕获区域特征的优势与双向LSTM,以增强不同局部区域间的关联性。此外,我们使用编码器提升计算效率。本方法以摄像头与LiDAR传感器生成的“鸟瞰图”为输入,通过模拟位置(x,y)及航向偏转角(Yaw)生成未来轨迹。实验结果表明,所提方法优于现有基于视觉的策略,在L5kit测试集上每行驶1000英里平均仅发生3.7次碰撞,显著提升了碰撞避免的成功率,为自动驾驶提供了有前景的解决方案。