End-to-end (E2E) autonomous driving presents a promising approach for translating perceptual inputs directly into driving actions. However, prohibitive annotation costs and temporal data quality degradation hinder long-term real-world deployment. While combining imitation learning (IL) and reinforcement learning (RL) is a common strategy for policy improvement, conventional RL training relies on delayed, event-based rewards-policies learn only from catastrophic outcomes such as collisions, leading to premature convergence to suboptimal behaviors. To address these limitations, we introduce GSDrive, a framework that exploits 3D Gaussian Splatting (3DGS) for differentiable, physics-based reward shaping in E2E driving policy improvement. Our method incorporates a flow matching-based trajectory predictor within the 3DGS simulator, enabling multi-mode trajectory probing where candidate trajectories are rolled out to assess prospective rewards. This establishes a bidirectional knowledge exchange between IL and RL by grounding reward functions in physically simulated interaction signals, offering immediate dense feedback instead of sparse catastrophic events. Evaluated on the reconstructed nuScenes dataset, our method surpasses existing simulation-based RL driving approaches in closed-loop experiments. Code is available at https://github.com/ZionGo6/GSDrive.
翻译:端到端自动驾驶为将感知输入直接转化为驾驶行为提供了一条有前景的路径。然而,高昂的标注成本与时间序列数据质量退化制约了其在真实世界中的长期部署。尽管结合模仿学习与强化学习是策略改进的常见策略,但传统强化学习训练依赖于延迟的事件驱动奖励——策略仅从碰撞等灾难性后果中学习,导致过早收敛至次优行为。为应对这些局限,我们提出GSDrive框架,该框架利用3D高斯泼溅技术实现端到端驾驶策略改进中基于物理的可微分奖励塑形。我们的方法在3DGS模拟器中集成了基于流匹配的轨迹预测器,支持多模态轨迹探测:通过展开候选轨迹评估预期奖励。该方法通过将奖励函数锚定于物理模拟的交互信号,建立模仿学习与强化学习间的双向知识交换,提供即时密集反馈而非稀疏的灾难性事件信号。在重建的nuScenes数据集上的评估表明,我们的方法在闭环实验中超越了现有基于模拟的强化学习驾驶方法。代码已开源至https://github.com/ZionGo6/GSDrive。