Offline planning often struggles with poor sampling efficiency as it tries to learn policies from scratch. Especially with diffusion models, such cold start practices mean that both training and sampling become very expensive. We hypothesize that certain environment constraint priors or cheaply available policies make it unnecessary to learn from scratch, and explore a way to incorporate such priors in the learning process. To achieve that, we borrow a variation of the Schr\"odinger bridge formulation from the image-to-image setting and apply it to planning tasks. We study the performance on some planning tasks and compare the performance against the DDPM formulation. The code for this work is available at https://github.com/adrshsrvstv/bridge_diffusion_planning.
翻译:离线规划常因需从零学习策略而受限于较差的采样效率。尤其在扩散模型中,此类冷启动实践意味着训练与采样均变得极其昂贵。我们假设某些环境约束先验或易于获取的策略使得从零学习变得不必要,并探索在学习过程中融入此类先验的方法。为此,我们从图像到图像设置中借鉴了薛定谔桥公式的一种变体,并将其应用于规划任务。我们在若干规划任务上评估其性能,并与DDPM公式进行对比。本工作的代码公开于 https://github.com/adrshsrvstv/bridge_diffusion_planning。