Anticipating the motion of neighboring vehicles is crucial for autonomous driving, especially on congested highways where even slight motion variations can result in catastrophic collisions. An accurate prediction of a future trajectory does not just rely on the previous trajectory, but also, more importantly, a simulation of the complex interactions between other vehicles nearby. Most state-of-the-art networks built to tackle the problem assume readily available past trajectory points, hence lacking a full end-to-end pipeline with direct video-to-output mechanism. In this article, we thus propose a novel end-to-end architecture that takes raw video inputs and outputs future trajectory predictions. It first extracts and tracks the 3D location of the nearby vehicles via multi-head attention-based regression networks as well as non-linear optimization. This provides the past trajectory points which then feeds into the trajectory prediction algorithm consisting of an attention-based LSTM encoder-decoder architecture, which allows it to model the complicated interdependence between the vehicles and make an accurate prediction of the future trajectory points of the surrounding vehicles. The proposed model is evaluated on the large-scale BLVD dataset, and has also been implemented on CARLA. The experimental results demonstrate that our approach outperforms various state-of-the-art models.
翻译:预测相邻车辆的运动对于自动驾驶至关重要,尤其是在交通拥堵的高速公路上,即使微小的运动变化也可能导致灾难性碰撞。准确的未来轨迹预测不仅依赖于历史轨迹,更重要的是需要模拟周围车辆之间的复杂交互。目前针对该问题的大多数先进网络都假设历史轨迹数据已预先获取,因此缺乏直接从视频输入到输出的完整端到端流水线。为此,本文提出了一种新颖的端到端架构,能够直接处理原始视频输入并输出未来轨迹预测。该架构首先通过基于多头注意力机制的回归网络和非线性优化提取并跟踪周围车辆的三维位置,从而获得历史轨迹点。随后,这些轨迹点被输入由注意力机制LSTM编码器-解码器架构组成的轨迹预测算法,该算法能够建模车辆间的复杂依赖关系,并精确预测周围车辆的未来轨迹点。所提模型在大规模BLVD数据集上进行了评估,并在CARLA平台上实现。实验结果表明,我们的方法优于多种现有先进模型。