Trajectory prediction is, naturally, a key task for vehicle autonomy. While the number of traffic rules is limited, the combinations and uncertainties associated with each agent's behaviour in real-world scenarios are nearly impossible to encode. Consequently, there is a growing interest in learning-based trajectory prediction. The proposed method in this paper predicts trajectories by considering perception and trajectory prediction as a unified system. In considering them as unified tasks, we show that there is the potential to improve the performance of perception. To achieve these goals, we present BEVSeg2TP - a surround-view camera bird's-eye-view-based joint vehicle segmentation and ego vehicle trajectory prediction system for autonomous vehicles. The proposed system uses a network trained on multiple camera views. The images are transformed using several deep learning techniques to perform semantic segmentation of objects, including other vehicles, in the scene. The segmentation outputs are fused across the camera views to obtain a comprehensive representation of the surrounding vehicles from the bird's-eye-view perspective. The system further predicts the future trajectory of the ego vehicle using a spatiotemporal probabilistic network (STPN) to optimize trajectory prediction. This network leverages information from encoder-decoder transformers and joint vehicle segmentation.
翻译:轨迹预测是车辆自动驾驶中的关键任务。尽管交通规则数量有限,但真实场景中各交通参与者行为的组合与不确定性几乎无法完全编码。因此,基于学习的轨迹预测方法日益受到关注。本文提出的方法将感知与轨迹预测视为统一系统进行预测。通过将其视为统一任务,我们证明感知性能具有提升潜力。为实现上述目标,我们提出BEVSeg2TP——一种面向自动驾驶车辆的环视摄像头鸟瞰图联合车辆分割与自车轨迹预测系统。该系统采用多摄像头视角训练的神经网络,通过多种深度学习技术将图像变换后,对场景中的目标(包括其他车辆)进行语义分割。各摄像头视角的分割输出经融合处理,从鸟瞰图视角获得周围车辆的全面表征。系统进一步利用时空概率网络(STPN)优化轨迹预测,该网络通过编码器-解码器Transformer与联合车辆分割信息实现自车未来轨迹预测。