Deploying pretrained policies in real-world applications presents substantial challenges that fundamentally limit the practical applicability of learning-based control systems. When autonomous systems encounter environmental changes in system dynamics, sensor drift, or task objectives, fixed policies rapidly degrade in performance. We show that employing Real-Time Recurrent Reinforcement Learning (RTRRL), a biologically plausible algorithm for online adaptation, can effectively fine-tune a pretrained policy to improve autonomous agents' performance on driving tasks. We further show that RTRRL synergizes with a recent biologically inspired recurrent network model, the Liquid-Resistance Liquid-Capacitance RNN. We demonstrate the effectiveness of this closed-loop approach in a simulated CarRacing environment and in a real-world line-following task with a RoboRacer car equipped with an event camera.
翻译:在现实世界应用中部署预训练策略面临着重大挑战,这些挑战从根本上限制了基于学习的控制系统的实际适用性。当自主系统遭遇系统动力学、传感器漂移或任务目标等环境变化时,固定策略的性能会迅速下降。我们证明,采用实时循环强化学习——一种具有生物学合理性的在线适应算法——能够有效微调预训练策略,从而提升自主智能体在驾驶任务上的性能。我们进一步证明,RTRRL 与近期一种受生物学启发的循环网络模型——液阻-液容循环神经网络——具有协同效应。我们在模拟的 CarRacing 环境以及配备事件相机的 RoboRacer 实车线跟随任务中,验证了这种闭环方法的有效性。