Pure Pursuit (PP) is widely used in autonomous racing for real-time path tracking due to its efficiency and geometric clarity, yet performance is highly sensitive to how key parameters-lookahead distance and steering gain-are chosen. Standard velocity-based schedules adjust these only approximately and often fail to transfer across tracks and speed profiles. We propose a reinforcement-learning (RL) approach that jointly chooses the lookahead Ld and a steering gain g online using Proximal Policy Optimization (PPO). The policy observes compact state features (speed and curvature taps) and outputs (Ld, g) at each control step. Trained in F1TENTH Gym and deployed in a ROS 2 stack, the policy drives PP directly (with light smoothing) and requires no per-map retuning. Across simulation and real-car tests, the proposed RL-PP controller that jointly selects (Ld, g) consistently outperforms fixed-lookahead PP, velocity-scheduled adaptive PP, and an RL lookahead-only variant, and it also exceeds a kinematic MPC raceline tracker under our evaluated settings in lap time, path-tracking accuracy, and steering smoothness, demonstrating that policy-guided parameter tuning can reliably improve classical geometry-based control.
翻译:纯追踪(PP)算法因其高效性和几何清晰性,在自动驾驶赛车实时路径跟踪中被广泛采用,但其性能对关键参数——前瞻距离和转向增益——的选择高度敏感。传统的基于速度的调度方法仅对这些参数进行近似调整,且往往难以在不同赛道和速度分布间迁移。我们提出一种强化学习(RL)方法,利用近端策略优化(PPO)在线联合选择前瞻距离Ld和转向增益g。该策略在每个控制步观测紧凑的状态特征(速度与曲率采样值)并输出(Ld, g)。通过在F1TENTH Gym中训练并部署于ROS 2框架,该策略可直接驱动PP算法(辅以轻度平滑处理),且无需针对不同地图进行重新调参。在仿真与实车测试中,所提出的联合选择(Ld, g)的RL-PP控制器,在单圈用时、路径跟踪精度和转向平滑性方面,均持续优于固定前瞻距离PP、速度自适应PP以及仅优化前瞻距离的RL变体,并在评估设定下超越了基于运动学的MPC参考线跟踪器,这证明策略引导的参数调优能够可靠地改进经典的基于几何的控制方法。