Autonomous vehicles must often contend with conflicting planning requirements, e.g., safety and comfort could be at odds with each other if avoiding a collision calls for slamming the brakes. To resolve such conflicts, assigning importance ranking to rules (i.e., imposing a rule hierarchy) has been proposed, which, in turn, induces rankings on trajectories based on the importance of the rules they satisfy. On one hand, imposing rule hierarchies can enhance interpretability, but introduce combinatorial complexity to planning; while on the other hand, differentiable reward structures can be leveraged by modern gradient-based optimization tools, but are less interpretable and unintuitive to tune. In this paper, we present an approach to equivalently express rule hierarchies as differentiable reward structures amenable to modern gradient-based optimizers, thereby, achieving the best of both worlds. We achieve this by formulating rank-preserving reward functions that are monotonic in the rank of the trajectories induced by the rule hierarchy; i.e., higher ranked trajectories receive higher reward. Equipped with a rule hierarchy and its corresponding rank-preserving reward function, we develop a two-stage planner that can efficiently resolve conflicting planning requirements. We demonstrate that our approach can generate motion plans in ~7-10 Hz for various challenging road navigation and intersection negotiation scenarios.
翻译:自动驾驶车辆常需应对相互冲突的规划需求,例如,若紧急刹车可避免碰撞,则安全性与舒适性可能相互矛盾。为解决此类冲突,已有研究提出为规则分配重要性等级(即建立规则层次结构),进而根据轨迹满足规则的重要性对轨迹进行排序。一方面,引入规则层次结构可增强可解释性,但会带来规划的组合复杂性;另一方面,现代基于梯度的优化工具可利用可微分的奖励结构,但这类结构可解释性较差且调参不够直观。本文提出一种将规则层次结构等价表达为适用于现代梯度优化器的可微分奖励结构的方法,从而兼顾两者优势。我们通过构造保序奖励函数实现该目标:该函数随规则层次结构所诱导的轨迹等级单调递增,即等级越高的轨迹获得越高的奖励。结合规则层次结构及其对应的保序奖励函数,我们开发了一种能高效解决冲突规划需求的两阶段规划器。实验表明,我们的方法可在约7-10 Hz的频率下为各类复杂的道路导航及交叉路口协商场景生成运动规划方案。