Residual policy learning (RPL), in which a learned policy refines a static base policy using deep reinforcement learning (DRL), has shown strong performance across various robotic applications. Its effectiveness is particularly evident in autonomous racing, a domain that serves as a challenging benchmark for real-world DRL. However, deploying RPL-based controllers introduces system complexity and increases inference latency. We address this by introducing an extension of RPL named attenuated residual policy optimization ($α$-RPO). Unlike standard RPL, $α$-RPO yields a standalone neural policy by progressively attenuating the base policy, which initially serves to bootstrap learning. Furthermore, this mechanism enables a form of privileged learning, where the base policy is permitted to use sensor modalities not required for final deployment. We design $α$-RPO to integrate seamlessly with PPO, ensuring that the attenuated influence of the base controller is dynamically compensated during policy optimization. We evaluate $α$-RPO by building a framework for 1:10-scaled autonomous racing around it. In both simulation and zero-shot real-world transfer to Roboracer cars, $α$-RPO not only reduces system complexity but also improves driving performance compared to baselines - demonstrating its practicality for robotic deployment. Our code is available at: https://github.com/raphajaner/arpo_racing.
翻译:残差策略学习(RPL)通过深度强化学习(DRL)使学习策略能够优化静态基础策略,已在多种机器人应用中展现出卓越性能。其在自主赛车领域的有效性尤为突出,该领域被视为现实世界DRL应用的重要挑战性基准。然而,基于RPL的控制器部署会引入系统复杂性并增加推理延迟。为此,我们提出RPL的扩展方法——衰减残差策略优化($α$-RPO)。与标准RPL不同,$α$-RPO通过逐步衰减基础策略(该策略在初始阶段用于引导学习)最终生成独立的神经策略。此外,该机制实现了一种特权学习形式:允许基础策略使用最终部署时非必需的传感器模态。我们将$α$-RPO设计为可与PPO无缝集成,确保在策略优化过程中动态补偿基础控制器的衰减影响。通过构建1:10比例自主赛车框架对$α$-RPO进行评估。在仿真环境及向Roboracer赛车的零样本现实世界迁移中,$α$-RPO不仅降低了系统复杂度,其驾驶性能也优于基线方法——这证明了其在机器人部署中的实用性。我们的代码已开源:https://github.com/raphajaner/arpo_racing。