End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degradation hinder long-term real-world deployment. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards-policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we introduce GSDrive, a framework that exploits 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping in E2E driving policy improvement. Our method incorporates a flow matching-based trajectory predictor within the 3DGS simulator, enabling multi-mode trajectory probing where candidate trajectories are rolled out to assess prospective rewards. This establishes a bidirectional knowledge exchange between IL and RL by grounding reward functions in physically simulated interaction signals, offering immediate dense feedback instead of sparse catastrophic events. Evaluated on the reconstructed nuScenes dataset, our method surpasses existing simulation-based RL driving approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.
翻译:端到端(E2E)自动驾驶为将感知输入直接转化为驾驶动作提供了一种有前景的方法。然而,高昂的标注成本和时间数据质量退化阻碍了其在真实世界中的长期部署。虽然结合模仿学习(IL)和强化学习(RL)是策略改进的常见方案,但传统RL训练依赖于延迟的、基于事件的奖励——策略仅从碰撞等灾难性后果中学习,导致过早收敛至次优行为。为解决这些局限,我们提出GSDrive框架,该框架利用三维高斯泼溅(3DGS)在E2E驾驶策略改进中实现可微的物理驱动奖励塑形。我们的方法在3DGS模拟器中集成了基于流匹配的轨迹预测器,从而实现多模态轨迹探测:通过滚动执行候选轨迹来评估预期奖励。这通过将奖励函数建立在物理模拟交互信号之上,在IL和RL之间建立双向知识交换,提供即时密集反馈而非稀疏灾难事件。在重建的nuScenes数据集上的评估表明,我们的方法在闭环实验中超越了现有基于模拟的RL驾驶方法。代码已开源至https://github.com/ZionGo6/GSDrive。