Learning Energy-Based Models by Cooperative Diffusion Recovery Likelihood

Training energy-based models (EBMs) with maximum likelihood estimation on high-dimensional data can be both challenging and time-consuming. As a result, there a noticeable gap in sample quality between EBMs and other generative frameworks like GANs and diffusion models. To close this gap, inspired by the recent efforts of learning EBMs by maximimizing diffusion recovery likelihood (DRL), we propose cooperative diffusion recovery likelihood (CDRL), an effective approach to tractably learn and sample from a series of EBMs defined on increasingly noisy versons of a dataset, paired with an initializer model for each EBM. At each noise level, the initializer model learns to amortize the sampling process of the EBM, and the two models are jointly estimated within a cooperative training framework. Samples from the initializer serve as starting points that are refined by a few sampling steps from the EBM. With the refined samples, the EBM is optimized by maximizing recovery likelihood, while the initializer is optimized by learning from the difference between the refined samples and the initial samples. We develop a new noise schedule and a variance reduction technique to further improve the sample quality. Combining these advances, we significantly boost the FID scores compared to existing EBM methods on CIFAR-10 and ImageNet 32x32, with a 2x speedup over DRL. In addition, we extend our method to compositional generation and image inpainting tasks, and showcase the compatibility of CDRL with classifier-free guidance for conditional generation, achieving similar trade-offs between sample quality and sample diversity as in diffusion models.

翻译：基于最大似然估计在高维数据上训练能量模型（EBM）既困难又耗时。因此，EBM与GANs和扩散模型等其他生成框架之间存在显著的样本质量差距。为弥合这一差距，受近期通过最大化扩散恢复似然（DRL）学习EBM研究的启发，我们提出协作扩散恢复似然（CDRL）方法——一种有效的高效学习与采样框架，用于定义在数据集不同噪声版本上的系列EBM，并为每个EBM配备初始化模型。在每个噪声级别，初始化模型学习分摊EBM的采样过程，两个模型在协作训练框架内联合估计。初始化器生成的样本作为起始点，经过EBM的少量采样步骤优化后得到精炼样本。利用精炼样本，EBM通过最大化恢复似然进行优化，而初始化器则通过学习精炼样本与初始样本的差异进行优化。我们开发了新的噪声调度和方差缩减技术以进一步提升样本质量。结合这些改进，我们在CIFAR-10和ImageNet 32x32上相较现有EBM方法显著提升了FID分数，且速度达到DRL的两倍。此外，我们将方法扩展到组合生成和图像修复任务，展示了CDRL与无分类器引导在条件生成中的兼容性，实现了与扩散模型相似的样本质量与多样性权衡。