We study diffusion-based world models for reinforcement learning, which offer high generative fidelity but face critical efficiency challenges in control. Current methods either require heavyweight models at inference or rely on highly sequential imagination, both of which impose prohibitive computational costs. We propose Horizon Imagination (HI), an on-policy imagination process for discrete stochastic policies that denoises multiple future observations in parallel. HI incorporates a stabilization mechanism and a novel sampling schedule that decouples the denoising budget from the effective horizon over which denoising is applied while also supporting sub-frame budgets. Experiments on Atari 100K and Craftium show that our approach maintains control performance with a sub-frame budget of half the denoising steps and achieves superior generation quality under varied schedules. Code is available at https://github.com/leor-c/horizon-imagination.
翻译:本文研究基于扩散的强化学习世界模型,该类模型虽具备高生成保真度,但在控制任务中面临严峻的效率挑战。现有方法要么需要在推理时使用计算量庞大的模型,要么依赖于高度序列化的想象过程,二者均带来难以承受的计算开销。我们提出地平线想象(HI),一种面向离散随机策略的同策略想象过程,能够并行地对多个未来观测进行去噪。HI包含一种稳定化机制和一种新颖的采样调度方案,该方案将去噪计算预算与去噪应用的有效时间范围解耦,同时支持低于单帧的预算。在Atari 100K和Craftium上的实验表明,我们的方法在使用仅一半去噪步数的子帧预算时仍能保持控制性能,并在多种调度方案下实现更优的生成质量。代码发布于 https://github.com/leor-c/horizon-imagination。