We present Generalized Contrastive Divergence (GCD), a novel objective function for training an energy-based model (EBM) and a sampler simultaneously. GCD generalizes Contrastive Divergence (Hinton, 2002), a celebrated algorithm for training EBM, by replacing Markov Chain Monte Carlo (MCMC) distribution with a trainable sampler, such as a diffusion model. In GCD, the joint training of EBM and a diffusion model is formulated as a minimax problem, which reaches an equilibrium when both models converge to the data distribution. The minimax learning with GCD bears interesting equivalence to inverse reinforcement learning, where the energy corresponds to a negative reward, the diffusion model is a policy, and the real data is expert demonstrations. We present preliminary yet promising results showing that joint training is beneficial for both EBM and a diffusion model. GCD enables EBM training without MCMC while improving the sample quality of a diffusion model.
翻译:我们提出广义对比散度(GCD),一种用于同时训练能量基模型(EBM)和采样器的新型目标函数。GCD通过用可训练采样器(如扩散模型)替代马尔可夫链蒙特卡洛(MCMC)分布,推广了Hinton(2002)提出的著名EBM训练算法——对比散度。在GCD中,EBM与扩散模型的联合训练被形式化为一个极小极大问题,当两个模型均收敛至数据分布时达到均衡。基于GCD的极小极大学习与逆向强化学习存在有趣的等价性:其中能量对应负奖励,扩散模型为策略,真实数据为专家示范。我们展示了初步但令人鼓舞的结果,表明联合训练对EBM和扩散模型均有益处。GCD实现了无需MCMC的EBM训练,同时提升了扩散模型的样本质量。