TC-Driver: Trajectory Conditioned Driving for Robust Autonomous Racing -- A Reinforcement Learning Approach

Autonomous racing is becoming popular for academic and industry researchers as a test for general autonomous driving by pushing perception, planning, and control algorithms to their limits. While traditional control methods such as MPC are capable of generating an optimal control sequence at the edge of the vehicles physical controllability, these methods are sensitive to the accuracy of the modeling parameters. This paper presents TC-Driver, a RL approach for robust control in autonomous racing. In particular, the TC-Driver agent is conditioned by a trajectory generated by any arbitrary traditional high-level planner. The proposed TC-Driver addresses the tire parameter modeling inaccuracies by exploiting the heuristic nature of RL while leveraging the reliability of traditional planning methods in a hierarchical control structure. We train the agent under varying tire conditions, allowing it to generalize to different model parameters, aiming to increase the racing capabilities of the system in practice. The proposed RL method outperforms a non-learning-based MPC with a 2.7 lower crash ratio in a model mismatch setting, underlining robustness to parameter discrepancies. In addition, the average RL inference duration is 0.25 ms compared to the average MPC solving time of 11.5 ms, yielding a nearly 40-fold speedup, allowing for complex control deployment in computationally constrained devices. Lastly, we show that the frequently utilized end-to-end RL architecture, as a control policy directly learned from sensory input, is not well suited to model mismatch robustness nor track generalization. Our realistic simulations show that TC-Driver achieves a 6.7 and 3-fold lower crash ratio under model mismatch and track generalization settings, while simultaneously achieving lower lap times than an end-to-end approach, demonstrating the viability of TC-driver to robust autonomous racing.

翻译：自主赛车正成为学术界和工业界研究人员的流行测试平台，它通过将感知、规划和控制算法推向极限来验证通用自动驾驶能力。尽管模型预测控制等传统控制方法能够在车辆物理可控性边界处生成最优控制序列，但这些方法对建模参数的准确性非常敏感。本文提出TC-Driver，一种用于自主赛车鲁棒控制的强化学习方法。具体而言，TC-Driver智能体由任意传统高级规划器生成的轨迹进行条件约束。所提出的TC-Driver通过利用强化学习的启发式特性，在分层控制结构中结合传统规划方法的可靠性，解决了轮胎参数建模不准确的问题。我们在不同轮胎条件下训练智能体，使其能够泛化到不同模型参数，旨在提升系统在实际中的赛车能力。所提出的强化学习方法在模型失配设置下以2.7倍更低的碰撞率优于非学习型模型预测控制，突显了对参数差异的鲁棒性。此外，强化学习推理平均耗时0.25毫秒，而模型预测控制求解平均耗时11.5毫秒，实现了近40倍的加速，使得复杂控制能够在计算受限设备中部署。最后，我们展示了常用的端到端强化学习架构（即直接从感知输入学习控制策略）不适用于模型失配鲁棒性或赛道泛化。我们的真实仿真表明，TC-Driver在模型失配和赛道泛化设置下分别实现了6.7倍和3倍的碰撞率降低，同时圈速低于端到端方法，证明了TC-Driver在鲁棒自主赛车中的可行性。