Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete MDP through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.
翻译:自主车辆(AV)在规划自身运动时需考虑相邻智能体的多模态行为。现有大量轨迹规划器试图在"所有"可能的未来情境下寻找单一最优轨迹,却忽略了双向交互影响,导致规划过于保守。策略规划(即自车规划能响应环境多模态行为的策略)是极具前景的方向,因为它能考虑AV与环境之间的动作-反应交互。然而,现有策略规划器大多难以适应真实自动驾驶应用的复杂性:它们或与当代深度学习预测模型不兼容,或缺乏可解释性,或无法生成高质量轨迹。为弥补这一缺陷,我们提出树形策略规划(TPP),该策略规划器兼容最先进的深度学习预测模型,可生成多阶段运动规划,并考虑自车对环境行为的影响。TPP的核心思想是通过构建两种树形结构,将连续优化问题简化为可解离散马尔可夫决策过程(MDP):自车轨迹树用于生成自车轨迹选项,场景树用于生成多模态的自车条件化环境预测。我们基于真实世界的nuScenes数据集开展闭环仿真实验验证TPP的有效性,结果表明TPP可扩展至真实AV场景,且显著优于非策略基线方法。