Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and an open-loop gap. In this work, we propose RAD, a 3DGS-based closed-loop Reinforcement Learning (RL) framework for end-to-end Autonomous Driving. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards to guide the policy in effectively responding to safety-critical events and understanding real-world causal relationships. To better align with human driving behavior, we incorporate IL into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, particularly exhibiting a 3x lower collision rate. Abundant closed-loop results are presented in the supplementary material. Code is available at https://github.com/hustvl/RAD for facilitating future research.
翻译:现有的端到端自动驾驶算法通常遵循模仿学习范式,该范式面临因果混淆和开环差距等挑战。本研究提出RAD,一种基于3D高斯泼溅的闭环强化学习框架,用于端到端自动驾驶。通过运用3DGS技术,我们构建了真实物理世界的高保真数字复刻,使自动驾驶策略能够广泛探索状态空间,并通过大规模试错学习处理分布外场景。为提升安全性,我们设计了专项奖励函数,引导策略有效应对安全关键事件并理解真实世界的因果关系。为更好地契合人类驾驶行为,我们将模仿学习作为正则化项融入强化学习训练。我们构建了一个包含多样化、未见过的3DGS环境的闭环评估基准。与基于模仿学习的方法相比,RAD在多数闭环指标上表现更优,尤其实现了3倍更低的碰撞率。补充材料中提供了丰富的闭环测试结果。代码已发布于https://github.com/hustvl/RAD以促进后续研究。