Autonomous vehicles must often contend with conflicting planning requirements, e.g., safety and comfort could be at odds with each other if avoiding a collision calls for slamming the brakes. To resolve such conflicts, assigning importance ranking to rules (i.e., imposing a rule hierarchy) has been proposed, which, in turn, induces rankings on trajectories based on the importance of the rules they satisfy. On one hand, imposing rule hierarchies can enhance interpretability, but introduce combinatorial complexity to planning; while on the other hand, differentiable reward structures can be leveraged by modern gradient-based optimization tools, but are less interpretable and unintuitive to tune. In this paper, we present an approach to equivalently express rule hierarchies as differentiable reward structures amenable to modern gradient-based optimizers, thereby, achieving the best of both worlds. We achieve this by formulating rank-preserving reward functions that are monotonic in the rank of the trajectories induced by the rule hierarchy; i.e., higher ranked trajectories receive higher reward. Equipped with a rule hierarchy and its corresponding rank-preserving reward function, we develop a two-stage planner that can efficiently resolve conflicting planning requirements. We demonstrate that our approach can generate motion plans in ~7-10 Hz for various challenging road navigation and intersection negotiation scenarios.
翻译:自动驾驶车辆常需处理相互冲突的规划需求,例如在避免碰撞需要急刹车时,安全性与舒适性可能相互矛盾。为解决此类冲突,已有研究提出对规则进行重要性排序(即建立规则层次结构),进而根据轨迹满足规则的重要性等级生成轨迹排序。一方面,规则层次结构可增强可解释性,但会引入组合复杂性;另一方面,可微奖励结构可被现代梯度优化工具利用,但可解释性较弱且调参不直观。本文提出一种方法,将规则层次结构等价表达为适用于现代梯度优化器的可微奖励结构,从而兼具二者优势。通过构造保序奖励函数——该函数对规则层次结构诱导的轨迹排序具有单调性(即排序越高的轨迹获得更高奖励),我们实现了这一目标。基于规则层次结构及其对应的保序奖励函数,我们开发出能高效解决冲突规划需求的两阶段规划器。实验表明,在多种具有挑战性的道路导航与交叉口协商场景中,该方法能以约7-10 Hz的频率生成运动规划方案。