Automated vehicles operating in urban environments have to reliably interact with other traffic participants. Planning algorithms often utilize separate prediction modules forecasting probabilistic, multi-modal, and interactive behaviors of objects. Designing prediction and planning as two separate modules introduces significant challenges, particularly due to the interdependence of these modules. This work proposes a deep learning methodology to combine prediction and planning. A conditional GAN with the U-Net architecture is trained to predict two high-resolution image sequences. The sequences represent explicit motion predictions, mainly used to train context understanding, and pixel state values suitable for planning encoding kinematic reachability, object dynamics, safety, and driving comfort. The model can be trained offline on target images rendered by a sampling-based model-predictive planner, leveraging real-world driving data. Our results demonstrate intuitive behavior in complex situations, such as lane changes amidst conflicting objectives.
翻译:在城区运行的自动驾驶车辆需与其他交通参与者进行可靠交互。规划算法通常采用独立的预测模块,用以生成物体行为的概率性、多模态及交互式预测。将预测与规划设计为两个独立模块会引入显著挑战,这尤其源于两者间的相互依赖性。本文提出一种融合预测与规划的深度学习方法:采用U-Net架构的条件生成对抗网络被训练生成两组高分辨率图像序列——显式运动预测序列(主要用于训练场景理解能力)与适用于规划的像素状态价值序列(其编码运动可达性、物体动力学、安全性与驾驶舒适性)。该模型可利用真实驾驶数据,通过基于采样的模型预测规划器渲染的目标图像进行离线训练。实验结果证明,本方法在包含冲突目标的变道等复杂场景中展现出直观的行为表现。