Sampling-based trajectory planners are widely used for agile autonomous driving due to their ability to generate fast, smooth, and kinodynamically feasible trajectories. However, their behavior is often governed by a cost function with manually tuned, static weights, which forces a tactical compromise that is suboptimal across the wide range of scenarios encountered in a race. To address this shortcoming, we propose using a Reinforcement Learning (RL) agent as a high-level behavioral selector that dynamically switches the cost function parameters of an analytical, low-level trajectory planner during runtime. We show the effectiveness of our approach in simulation in an autonomous racing environment where our RL-based planner achieved 0% collision rate while reducing overtaking time by up to 60% compared to state-of-the-art static planners. Our new agent now dynamically switches between aggressive and conservative behaviors, enabling interactive maneuvers unattainable with static configurations. These results demonstrate that integrating reinforcement learning as a high-level selector resolves the inherent trade-off between safety and competitiveness in autonomous racing planners. The proposed methodology offers a pathway toward adaptive yet interpretable motion planning for broader autonomous driving applications.
翻译:采样式轨迹规划器因其能够生成快速、平滑且满足运动学动力学可行性的轨迹,在敏捷自动驾驶领域得到广泛应用。然而,其行为通常由具有手动调整静态权重的代价函数所控制,这迫使系统在比赛中遇到的各种场景中采取战术性折衷方案,导致整体性能欠优。为解决这一不足,我们提出使用强化学习(RL)智能体作为高层行为选择器,在运行时动态切换解析式底层轨迹规划器的代价函数参数。我们在自动驾驶赛车仿真环境中验证了所提方法的有效性:与最先进的静态规划器相比,基于强化学习的规划器实现了0%的碰撞率,并将超车时间缩短达60%。我们的新型智能体能够在激进与保守行为之间动态切换,实现了静态配置无法达成的交互式机动策略。这些结果表明,将强化学习作为高层选择器集成到系统中,能够解决自动驾驶赛车规划器在安全性与竞争性之间固有的权衡问题。所提出的方法论为更广泛的自动驾驶应用提供了通向自适应且可解释的运动规划的技术路径。