Autonomous vehicles (AVs) need to reason about the multimodal behavior of neighboring agents while planning their own motion. Many existing trajectory planners seek a single trajectory that performs well under \emph{all} plausible futures simultaneously, ignoring bi-directional interactions and thus leading to overly conservative plans. Policy planning, whereby the ego agent plans a policy that reacts to the environment's multimodal behavior, is a promising direction as it can account for the action-reaction interactions between the AV and the environment. However, most existing policy planners do not scale to the complexity of real autonomous vehicle applications: they are either not compatible with modern deep learning prediction models, not interpretable, or not able to generate high quality trajectories. To fill this gap, we propose Tree Policy Planning (TPP), a policy planner that is compatible with state-of-the-art deep learning prediction models, generates multistage motion plans, and accounts for the influence of ego agent on the environment behavior. The key idea of TPP is to reduce the continuous optimization problem into a tractable discrete Markov Decision Process (MDP) through the construction of two tree structures: an ego trajectory tree for ego trajectory options, and a scenario tree for multi-modal ego-conditioned environment predictions. We demonstrate the efficacy of TPP in closed-loop simulations based on real-world nuScenes dataset and results show that TPP scales to realistic AV scenarios and significantly outperforms non-policy baselines.
翻译:自主车辆(AVs)在规划自身运动时,需要推理相邻智能体的多模态行为。现有许多轨迹规划器试图在*所有*可能的未来情境下寻找单一最优轨迹,却忽略了双向交互,导致规划过于保守。策略规划作为一种有前景的方向,通过让自车智能体规划能够对环境多模态行为做出反应的策略,可考虑AV与环境间的行动-反应交互。然而,现有策略规划器大多难以适应真实自动驾驶应用的复杂性:它们要么无法兼容现代深度学习预测模型,要么缺乏可解释性,要么无法生成高质量轨迹。为填补这一空白,我们提出树状策略规划(TPP),这是一种与最先进深度学习预测模型兼容的策略规划器,能够生成多阶段运动规划,并考虑自车智能体对环境行为的影响。TPP的核心思想是通过构建两种树结构——自车轨迹树(用于自车轨迹选项)和场景树(用于多模态自车条件环境预测),将连续优化问题简化为可解的离散马尔可夫决策过程(MDP)。基于真实世界nuScenes数据集的闭环仿真验证了TPP的有效性,结果表明TPP可扩展至真实AV场景,并显著优于非策略基线方法。