We investigate the problem of autonomous racing among teams of cooperative agents that are subject to realistic racing rules. Our work extends previous research on hierarchical control in head-to-head autonomous racing by considering a generalized version of the problem while maintaining the two-level hierarchical control structure. A high-level tactical planner constructs a discrete game that encodes the complex rules using simplified dynamics to produce a sequence of target waypoints. The low-level path planner uses these waypoints as a reference trajectory and computes high-resolution control inputs by solving a simplified formulation of a racing game with a simplified representation of the realistic racing rules. We explore two approaches for the low-level path planner: training a multi-agent reinforcement learning (MARL) policy and solving a linear-quadratic Nash game (LQNG) approximation. We evaluate our controllers on simple and complex tracks against three baselines: an end-to-end MARL controller, a MARL controller tracking a fixed racing line, and an LQNG controller tracking a fixed racing line. Quantitative results show our hierarchical methods outperform the baselines in terms of race wins, overall team performance, and compliance with the rules. Qualitatively, we observe the hierarchical controllers mimic actions performed by expert human drivers such as coordinated overtaking, defending against multiple opponents, and long-term planning for delayed advantages.
翻译:我们研究了受真实赛车规则约束的合作智能体车队在自动驾驶赛车问题中的挑战。本工作将先前针对一对一自动驾驶赛车的分层控制研究扩展到该问题的广义版本,同时保留两级分层控制结构。高层战术规划器构建一个离散博弈模型,该模型通过简化动力学对复杂规则进行编码,以生成一系列目标路径点。低层路径规划器将这些路径点作为参考轨迹,并通过求解一个基于简化真实赛车规则表示的赛车博弈简化公式来计算高分辨率控制输入。我们探索了两种低层路径规划器实现方法:训练多智能体强化学习策略与求解线性二次型纳什博弈近似。我们在简单与复杂赛道上将所提控制器与三种基线进行对比:端到端多智能体强化学习控制器、跟踪固定赛车线的多智能体强化学习控制器、以及跟踪固定赛车线的线性二次型纳什博弈控制器。定量结果表明,我们的分层方法在比赛胜率、车队整体性能及规则遵守程度方面均优于基线方法。定性分析显示,分层控制器能够模仿专业人类驾驶员的行为,例如协同超车、多对手防守以及为延迟优势进行长期规划。