Making safe and human-like decisions is an essential capability of autonomous driving systems, and learning-based behavior planning presents a promising pathway toward achieving this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. This framework consists of three components: a behavior generation module that produces a diverse set of candidate behaviors in the form of trajectory proposals, a conditional motion prediction network that predicts future trajectories of other agents based on each proposal, and a scoring module that evaluates the candidate plans using maximum entropy inverse reinforcement learning (IRL). We validate the proposed framework on a large-scale real-world urban driving dataset through comprehensive experiments. The results show that the conditional prediction model can predict distinct and reasonable future trajectories given different trajectory proposals and the IRL-based scoring module can select plans that are close to human driving. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Additionally, we find that the conditional prediction model improves both prediction and planning performance compared to the non-conditional model. Lastly, we note that learning the scoring module is crucial for aligning the evaluations with human drivers.
翻译:实现安全且类人的决策是自主驾驶系统的核心能力,基于学习的规划方法为实现该目标提供了有前景的途径。与现有直接输出决策的学习方法不同,本文提出一种预测性行为规划框架,通过从人类驾驶数据中学习预测与评估机制。该框架包含三个模块:行为生成模块负责生成多样化候选行为(以轨迹提议形式呈现),条件运动预测网络基于每个提议预测其他智能体的未来轨迹,以及基于最大熵逆向强化学习的评分模块对候选方案进行评估。通过大规模真实城市驾驶数据集的综合实验验证,结果表明:条件预测模型能针对不同轨迹提议生成合理且差异化的未来轨迹,而基于逆向强化学习的评分模块可选择与人类驾驶行为高度相似的方案。所提框架在模仿人类驾驶轨迹的相似度指标上优于其他基线方法。此外,研究发现条件预测模型相比非条件模型能同时提升预测与规划性能。最后,我们指出学习评分模块对实现评估标准与人类驾驶员的一致性至关重要。