Making safe and human-like decisions is an essential capability of autonomous driving systems and learning-based behavior planning is a promising pathway toward this objective. Distinguished from existing learning-based methods that directly output decisions, this work introduces a predictive behavior planning framework that learns to predict and evaluate from human driving data. The framework consists of three parts: a behavior generation module that produces a diverse set of candidate behaviors in the form of trajectory proposals, a conditional motion prediction network that predicts other agents' future trajectories based on each proposal, and a scoring module trained to properly evaluate the candidate plans using maximum entropy inverse reinforcement learning (IRL). We conduct comprehensive experiments to validate the proposed framework on a large-scale real-world urban driving dataset. The results show that the conditional prediction model can predict distinct and reasonable future trajectories given different trajectory proposals and the IRL-based scoring module can select plans that are close to human driving. The proposed framework outperforms other baseline methods in terms of similarity to human driving trajectories. Additionally, we find that the conditional prediction model improves both prediction and planning performance compared to the non-conditional model, and the learning of the scoring module is crucial for aligning the evaluations with human drivers.
翻译:实现安全且类人的决策是自动驾驶系统的核心能力,而基于学习的行为规划是实现这一目标的重要途径。与现有直接输出决策的学习方法不同,本文提出了一种预测性行为规划框架,该框架能够从人类驾驶数据中学习预测与评估。该框架由三部分组成:行为生成模块(以轨迹提议形式生成多样化候选行为)、条件运动预测网络(基于每个提议预测其他智能体的未来轨迹)以及评估模块(通过最大熵逆强化学习(IRL)训练以合理评估候选方案)。我们在大规模真实城市驾驶数据集上开展了综合实验验证提出的框架。结果表明,条件预测模型能根据不同的轨迹提议生成差异化且合理的未来轨迹,而基于IRL的评估模块能选择接近人类驾驶的方案。该框架在相似度指标上优于其他基线方法,更贴近人类驾驶轨迹。此外,我们发现条件预测模型相比非条件模型能同时提升预测与规划性能,而评估模块的学习对于使评价与人类驾驶员保持一致至关重要。