Diffusion models have shown strong competitiveness in offline reinforcement learning tasks by formulating decision-making as sequential generation. However, the practicality of these methods is limited due to the lengthy inference processes they require. In this paper, we address this problem by decomposing the sampling process of diffusion models into two decoupled subprocesses: 1) generating a feasible trajectory, which is a time-consuming process, and 2) optimizing the trajectory. With this decomposition approach, we are able to partially separate efficiency and quality factors, enabling us to simultaneously gain efficiency advantages and ensure quality assurance. We propose the Trajectory Diffuser, which utilizes a faster autoregressive model to handle the generation of feasible trajectories while retaining the trajectory optimization process of diffusion models. This allows us to achieve more efficient planning without sacrificing capability. To evaluate the effectiveness and efficiency of the Trajectory Diffuser, we conduct experiments on the D4RL benchmarks. The results demonstrate that our method achieves $\it 3$-$\it 10 \times$ faster inference speed compared to previous sequence modeling methods, while also outperforming them in terms of overall performance. https://github.com/RenMing-Huang/TrajectoryDiffuser Keywords: Reinforcement Learning and Efficient Planning and Diffusion Model
翻译:扩散模型通过将决策过程建模为序列生成,在离线强化学习任务中展现出强大的竞争力。然而,这些方法因需要冗长的推理过程而限制了其实用性。本文通过将扩散模型的采样过程分解为两个解耦的子过程来解决此问题:1)生成可行轨迹(这是一个耗时的过程),以及2)优化轨迹。借助这种分解方法,我们能够部分分离效率与质量因素,从而同时获得效率优势并确保质量保证。我们提出了轨迹扩散器(Trajectory Diffuser),它利用更快的自回归模型来处理可行轨迹的生成,同时保留扩散模型的轨迹优化过程。这使得我们能够在不牺牲能力的前提下实现更高效的规划。为了评估轨迹扩散器的有效性与效率,我们在D4RL基准测试上进行了实验。结果表明,与先前的序列建模方法相比,我们的方法实现了$\it 3$至$\it 10$倍的推理速度提升,同时在整体性能上也优于它们。https://github.com/RenMing-Huang/TrajectoryDiffuser 关键词:强化学习与高效规划与扩散模型