Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

Training energy-based models (EBMs) with maximum likelihood estimation on high-dimensional data can be both challenging and time-consuming. As a result, there a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximimizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versons of a dataset, paired with an initializer model for each EBM. At each noise level, the initializer model learns to amortize the sampling process of the EBM, and the two models are jointly estimated within a cooperative training framework. Samples from the initializer serve as starting points that are refined by a few sampling steps from the EBM. With the refined samples, the EBM is optimized by maximizing recovery likelihood, while the initializer is optimized by learning from the difference between the refined samples and the initial samples. We develop a new noise schedule and a variance reduction technique to further improve the sample quality. Combining these advances, we significantly boost the FID scores compared to existing EBM methods on CIFAR-10 and ImageNet 32x32, with a 2x speedup over DRL. In addition, we extend our method to compositional generation and image inpainting tasks, and showcase the compatibility of CDRL with classifier-free guidance for conditional generation, achieving similar trade-offs between sample quality and sample diversity as in diffusion models.

翻译：训练基于高维数据的最大似然估计能量模型既具挑战性又耗时，导致其在样本质量上与生成对抗网络、扩散模型等生成框架存在显著差距。为弥合这一差距，受近期通过最大化扩散恢复似然学习能量模型的启发，我们提出合作扩散恢复似然方法——一种可有效学习并采样一系列定义在数据集渐进噪声版本上的能量模型的框架，并为每个能量模型配备初始化模型。在每个噪声水平下，初始化模型通过摊销能量模型的采样过程进行学习，两者在合作训练框架中联合优化。初始化器的样本作为起点，经能量模型少量采样步骤精炼后，通过最大化恢复似然优化能量模型，同时通过精炼样本与初始样本的差异学习初始化模型。我们开发了新的噪声调度与方差缩减技术进一步提升样本质量。综合这些改进，我们在CIFAR-10和ImageNet 32x32上显著提升了FID分数，相比扩散恢复似然实现了2倍加速。此外，我们将方法扩展至组合生成与图像修复任务，展示了合作扩散恢复似然与无分类器引导条件生成的兼容性，在样本质量与多样性之间取得了与扩散模型相似的权衡效果。