We study online fine-tuning of pretrained control policies for autonomous driving using Real-Time Recurrent Reinforcement Learning (RTRRL), a memory-efficient algorithm that updates policy parameters at every time step without backpropagation through time. We extend RTRRL to support LrcSSM, a recently proposed nonlinear diagonal state-space model, and combine offline behavioral cloning with online RTRRL fine-tuning to adapt policies to distribution shifts at deployment. We validate the approach in the CarRacing simulation and on a 1:10-scale RoboRacer platform equipped with an event camera, where a pretrained policy is fine-tuned online during real-world line-following. To our knowledge, this is the first demonstration of online RL fine-tuning with event-camera observations on standard (non-spiking) hardware in closed-loop control. LrcSSM-based policies improve fastest and most consistently across both settings.
翻译:我们研究了利用实时递归强化学习(RTRRL)对自动驾驶预训练控制策略进行在线微调的方法。RTRRL是一种内存高效算法,可在每个时间步更新策略参数,无需通过时间进行反向传播。我们将RTRRL扩展以支持LrcSSM(最近提出的非线性对角状态空间模型),并将离线行为克隆与在线RTRRL微调相结合,使策略在部署时适应分布偏移。我们在CarRacing仿真环境以及配备事件相机的1:10比例RoboRacer平台上验证了该方法——在该平台上,预训练策略在真实世界循线任务中进行在线微调。据我们所知,这是首次在标准(非脉冲)硬件上采用事件相机观测进行在线强化学习微调的闭环控制演示。基于LrcSSM的策略在两个场景下均表现出最快且最一致的性能提升。