Guided sampling is a vital approach for applying diffusion models in real-world tasks that embeds human-defined guidance during the sampling procedure. This paper considers a general setting where the guidance is defined by an (unnormalized) energy function. The main challenge for this setting is that the intermediate guidance during the diffusion sampling procedure, which is jointly defined by the sampling distribution and the energy function, is unknown and is hard to estimate. To address this challenge, we propose an exact formulation of the intermediate guidance as well as a novel training objective named contrastive energy prediction (CEP) to learn the exact guidance. Our method is guaranteed to converge to the exact guidance under unlimited model capacity and data samples, while previous methods can not. We demonstrate the effectiveness of our method by applying it to offline reinforcement learning (RL). Extensive experiments on D4RL benchmarks demonstrate that our method outperforms existing state-of-the-art algorithms. We also provide some examples of applying CEP for image synthesis to demonstrate the scalability of CEP on high-dimensional data.
翻译:引导采样是将扩散模型应用于实际任务的重要方法,该方法在采样过程中嵌入了人类定义的引导。本文考虑一种通用设定,其中引导由(未归一化的)能量函数定义。该设定面临的主要挑战是:扩散采样过程中的中间引导由采样分布与能量函数共同定义,但该引导未知且难以估计。为解决此问题,我们提出中间引导的精确形式化表达,并设计名为对比能量预测(CEP)的新型训练目标来学习精确引导。在模型容量和数据样本无限时,我们的方法被证明能收敛至精确引导,而此前的方法无法实现。我们通过将其应用于离线强化学习(RL)验证了方法的有效性。在D4RL基准上的大量实验表明,我们的方法优于现有最先进算法。我们还提供了将CEP应用于图像合成的示例,以证明CEP在高维数据上的可扩展性。