Diffusion policies excel at visuomotor control but often fail catastrophically under severe out-of-distribution (OOD) disturbances, such as unexpected object displacements or visual corruptions. To address this vulnerability, we introduce the Dream Diffusion Policy (DDP), a framework that deeply integrates a diffusion world model into the policy's training objective via a shared 3D visual encoder. This co-optimization endows the policy with robust state-prediction capabilities. When encountering sudden OOD anomalies during inference, DDP detects the real-imagination discrepancy and actively abandons the corrupted visual stream. Instead, it relies on its internal "imagination" (autoregressively forecasted latent dynamics) to safely bypass the disruption, generating imagined trajectories before smoothly realigning with physical reality. Extensive evaluations demonstrate DDP's exceptional resilience. Notably, DDP achieves a 73.8% OOD success rate on MetaWorld (vs. 23.9% without predictive imagination) and an 83.3% success rate under severe real-world spatial shifts (vs. 3.3% without predictive imagination). Furthermore, as a stress test, DDP maintains a 76.7% real-world success rate even when relying entirely on open-loop imagination post-initialization.
翻译:扩散策略在视觉运动控制中表现出色,但在严重分布外干扰(如意外物体位移或视觉损坏)下常常灾难性失败。为解决这一脆弱性,我们提出梦境扩散策略(Dream Diffusion Policy, DDP),一种通过共享三维视觉编码器将扩散世界模型深度融合到策略训练目标中的框架。这种联合优化赋予了策略稳健的状态预测能力。当推理过程中出现突发分布外异常时,DDP会检测真实-想象差异并主动丢弃受污染的视觉流。相反,它依赖内部“想象”(自回归预测的潜在动力学)安全绕过干扰,生成想象轨迹,随后平滑地与物理现实重新对齐。广泛评估证明了DDP的卓越韧性。值得注意的是,DDP在MetaWorld上实现了73.8%的分布外成功率(无预测想象时为23.9%),并在严重真实世界空间偏移下达到83.3%的成功率(无预测想象时为3.3%)。此外,作为压力测试,即使在初始后完全依赖开环想象,DDP仍保持76.7%的真实世界成功率。