Energy-Based Models (EBMs) have been widely used for generative modeling. Contrastive Divergence (CD), a prevailing training objective for EBMs, requires sampling from the EBM with Markov Chain Monte Carlo methods (MCMCs), which leads to an irreconcilable trade-off between the computational burden and the validity of the CD. Running MCMCs till convergence is computationally intensive. On the other hand, short-run MCMC brings in an extra non-negligible parameter gradient term that is difficult to handle. In this paper, we provide a general interpretation of CD, viewing it as a special instance of our proposed Diffusion Contrastive Divergence (DCD) family. By replacing the Langevin dynamic used in CD with other EBM-parameter-free diffusion processes, we propose a more efficient divergence. We show that the proposed DCDs are both more computationally efficient than the CD and are not limited to a non-negligible gradient term. We conduct intensive experiments, including both synthesis data modeling and high-dimensional image denoising and generation, to show the advantages of the proposed DCDs. On the synthetic data learning and image denoising experiments, our proposed DCD outperforms CD by a large margin. In image generation experiments, the proposed DCD is capable of training an energy-based model for generating the Celab-A $32\times 32$ dataset, which is comparable to existing EBMs.
翻译:能量基模型(EBMs)已广泛应用于生成式建模。对比散度(CD)作为EBM的主流训练目标,需借助马尔可夫链蒙特卡洛方法(MCMCs)从EBM中采样,这导致了计算负担与CD有效性之间不可调和的权衡。运行MCMCs至收敛将带来极高的计算开销,而短程MCMC则会引入难以处理的、不可忽略的参数梯度项。本文对CD进行了广义解释,将其视为我们提出的扩散对比散度(DCD)家族的一个特例。通过用其他无EBM参数的扩散过程替代CD中使用的朗之万动力学,我们提出了一种更高效的散度。实验证明,所提出的DCD在计算效率上优于CD,且不受限于不可忽略的梯度项。我们开展了大量实验,包括合成数据建模与高维图像去噪及生成,以验证DCD的优势。在合成数据学习与图像去噪实验中,所提出的DCD性能大幅优于CD;在图像生成实验中,所提出的DCD能成功训练生成Celeb-A $32\times 32$数据集的能量基模型,其性能与现有EBM相当。