An energy-based model (EBM) is a popular generative framework that offers both explicit density and architectural flexibility, but training them is difficult since it is often unstable and time-consuming. In recent years, various training techniques have been developed, e.g., better divergence measures or stabilization in MCMC sampling, but there often exists a large gap between EBMs and other generative frameworks like GANs in terms of generation quality. In this paper, we propose a novel and effective framework for improving EBMs via contrastive representation learning (CRL). To be specific, we consider representations learned by contrastive methods as the true underlying latent variable. This contrastive latent variable could guide EBMs to understand the data structure better, so it can improve and accelerate EBM training significantly. To enable the joint training of EBM and CRL, we also design a new class of latent-variable EBMs for learning the joint density of data and the contrastive latent variable. Our experimental results demonstrate that our scheme achieves lower FID scores, compared to prior-art EBM methods (e.g., additionally using variational autoencoders or diffusion techniques), even with significantly faster and more memory-efficient training. We also show conditional and compositional generation abilities of our latent-variable EBMs as their additional benefits, even without explicit conditional training. The code is available at https://github.com/hankook/CLEL.
翻译:能量模型(EBM)是一种流行的生成式框架,兼具显式密度与架构灵活性,但其训练过程常因不稳定且耗时而困难重重。近年来虽发展出多种训练技术(如改进的散度度量或MCMC采样稳定性方法),但EBM与生成对抗网络(GAN)等其他生成框架在生成质量上仍存在显著差距。本文提出一种基于对比表示学习(CRL)改进EBM的新型高效框架。具体而言,我们将对比方法学到的表示视为真实的潜在变量,该对比潜在变量可引导EBM更深入理解数据结构,从而显著提升并加速EBM训练。为实现EBM与CRL的联合训练,我们设计了一类新型潜在变量能量模型,用于学习数据与对比潜在变量的联合密度。实验结果表明,与现有最优EBM方法(如额外使用变分自编码器或扩散技术)相比,本方案在实现更快训练速度与更低内存消耗的同时,获得了更低的FID分数。我们还展示了潜在变量EBM的附加优势——即使未进行显式条件训练,仍具备条件生成与组合生成能力。代码已开源至https://github.com/hankook/CLEL。