We introduce DreamerAD, the first latent world model framework that enables efficient reinforcement learning for autonomous driving by compressing diffusion sampling from 100 steps to 1 - achieving 80x speedup while maintaining visual interpretability. Training RL policies on real-world driving data incurs prohibitive costs and safety risks. While existing pixel-level diffusion world models enable safe imagination-based training, they suffer from multi-step diffusion inference latency (2s/frame) that prevents high-frequency RL interaction. Our approach leverages denoised latent features from video generation models through three key mechanisms: (1) shortcut forcing that reduces sampling complexity via recursive multi-resolution step compression, (2) an autoregressive dense reward model operating directly on latent representations for fine-grained credit assignment, and (3) Gaussian vocabulary sampling for GRPO that constrains exploration to physically plausible trajectories. DreamerAD achieves 87.7 EPDMS on NavSim v2, establishing state-of-the-art performance and demonstrating that latent-space RL is effective for autonomous driving.
翻译:我们提出DreamerAD——首个通过潜在世界模型框架实现高效强化学习自动驾驶的方法,将扩散采样步数从100步压缩至1步,在保持视觉可解释性的同时实现80倍加速。基于真实驾驶数据训练强化学习策略将带来高昂成本与安全风险。现有像素级扩散世界模型虽支持安全的想象空间训练,却受限于多步扩散推理延迟(每帧2秒),无法满足高频强化学习交互需求。本方法通过视频生成模型的去噪潜在特征实现三大关键机制:(1)快捷强制技术——利用递归多分辨率步长压缩降低采样复杂度;(2)基于潜在表示的自回归稠密奖励模型——实现细粒度信用分配;(3)面向GRPO的高斯词汇采样——将探索约束至物理可行的轨迹空间。DreamerAD在NavSim v2基准上达到87.7 EPDMS,刷新最优性能记录,证实了潜在空间强化学习在自动驾驶领域的有效性。