Learning a human-like driving policy from large-scale driving demonstrations is promising, but the uncertainty and non-deterministic nature of planning make it challenging. In this work, to cope with the uncertainty problem, we propose VADv2, an end-to-end driving model based on probabilistic planning. VADv2 takes multi-view image sequences as input in a streaming manner, transforms sensor data into environmental token embeddings, outputs the probabilistic distribution of action, and samples one action to control the vehicle. Only with camera sensors, VADv2 achieves state-of-the-art closed-loop performance on the CARLA Town05 benchmark, significantly outperforming all existing methods. It runs stably in a fully end-to-end manner, even without the rule-based wrapper. Closed-loop demos are presented at https://hgao-cv.github.io/VADv2.
翻译:从大规模驾驶演示中学习类人驾驶策略具有广阔前景,但规划的不确定性与非确定性特征使其充满挑战。为应对不确定性问题,本文提出VADv2——一种基于概率规划的端到端驾驶模型。VADv2以流式方式接收多视角图像序列,将传感器数据转化为环境token嵌入,输出动作的概率分布,并采样一个动作以控制车辆。仅依靠摄像头传感器,VADv2在CARLA Town05基准测试中实现了最先进的闭环性能,显著超越所有现有方法。它能够以完全端到端的方式稳定运行,甚至无需基于规则的包裹器。闭环演示见https://hgao-cv.github.io/VADv2。