Hierarchical Control for Head-to-Head Autonomous Racing

We develop a hierarchical controller for head-to-head autonomous racing. We first introduce a formulation of a racing game with realistic safety and fairness rules. A high-level planner approximates the original formulation as a discrete game with simplified state, control, and dynamics to easily encode the complex safety and fairness rules and calculates a series of target waypoints. The low-level controller takes the resulting waypoints as a reference trajectory and computes high-resolution control inputs by solving an alternative formulation approximation with simplified objectives and constraints. We consider two approaches for the low-level planner, constructing two hierarchical controllers. One approach uses multi-agent reinforcement learning (MARL), and the other solves a linear-quadratic Nash game (LQNG) to produce control inputs. The controllers are compared against three baselines: an end-to-end MARL controller, a MARL controller tracking a fixed racing line, and an LQNG controller tracking a fixed racing line. Quantitative results show that the proposed hierarchical methods outperform their respective baseline methods in terms of head-to-head race wins and abiding by the rules. The hierarchical controller using MARL for low-level control consistently outperformed all other methods by winning over 90% of head-to-head races and more consistently adhered to the complex racing rules. Qualitatively, we observe the proposed controllers mimicking actions performed by expert human drivers such as shielding/blocking, overtaking, and long-term planning for delayed advantages. We show that hierarchical planning for game-theoretic reasoning produces competitive behavior even when challenged with complex rules and constraints.

翻译：我们开发了一种用于头对头自主赛车的分层控制器。首先，我们提出了一种包含现实安全与公平规则的赛车博弈模型。高层规划器将原始问题近似为具有简化状态、控制量和动力学的离散博弈，以易于编码复杂的规则，并计算出一系列目标航点。低层控制器将这些航点作为参考轨迹，通过求解一个具有简化目标和约束的替代近似问题，生成高分辨率控制输入。我们考虑了两种低层规划方法，构建了两种分层控制器：一种采用多智能体强化学习（MARL），另一种通过求解线性二次纳什博弈（LQNG）生成控制输入。将这两种控制器与三种基线方法进行比较：端到端MARL控制器、跟踪固定赛车线的MARL控制器，以及跟踪固定赛车线的LQNG控制器。定量结果表明，所提出的分层方法在头对头比赛胜率和规则遵守程度方面均优于各自的基线方法。采用MARL进行低层控制的分层控制器表现最为突出，赢得了超过90%的头对头比赛，且更稳定地遵守了复杂赛车规则。定性分析显示，所提出的控制器能够模拟专业人类车手的操作，如阻挡/防守、超车和着眼于长期优势的规划。研究表明，即使面对复杂规则与约束，基于博弈论推理的分层规划仍能产生具有竞争力的行为。