Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.
翻译:引导采样是将扩散模型应用于实际任务的关键方法,它在采样过程中植入人类定义的引导。本文考虑引导由(未归一化)能量函数定义的通用设置。该设置的主要挑战在于,扩散采样过程中由采样分布与能量函数共同定义的中间引导未知且难以估计。为解决此问题,我们提出了中间引导的精确公式化表达,以及一种名为对比能量预测(CEP)的新型训练目标,用以学习精确引导。我们的方法在无限模型容量和数据样本下保证收敛至精确引导,而先前方法无法实现。通过将方法应用于离线强化学习(RL),我们验证了其有效性。在D4RL基准上的大量实验表明,我们的方法优于现有最先进算法。我们还提供了将CEP应用于图像合成的示例,以展示CEP在高维数据上的可扩展性。