Energy-based models (EBMs) are generative models inspired by statistical physics with a wide range of applications in unsupervised learning. Their performance is best measured by the cross-entropy (CE) of the model distribution relative to the data distribution. Using the CE as the objective for training is however challenging because the computation of its gradient with respect to the model parameters requires sampling the model distribution. Here we show how results for nonequilibrium thermodynamics based on Jarzynski equality together with tools from sequential Monte-Carlo sampling can be used to perform this computation efficiently and avoid the uncontrolled approximations made using the standard contrastive divergence algorithm. Specifically, we introduce a modification of the unadjusted Langevin algorithm (ULA) in which each walker acquires a weight that enables the estimation of the gradient of the cross-entropy at any step during GD, thereby bypassing sampling biases induced by slow mixing of ULA. We illustrate these results with numerical experiments on Gaussian mixture distributions as well as the MNIST dataset. We show that the proposed approach outperforms methods based on the contrastive divergence algorithm in all the considered situations.
翻译:能量模型(EBMs)是一类受统计物理学启发的生成模型,在无监督学习领域具有广泛的应用。其性能的最佳度量标准是模型分布相对于数据分布的交叉熵(CE)。然而,将CE作为训练目标具有挑战性,因为计算其对模型参数的梯度需要对模型分布进行采样。本文展示了如何利用基于Jarzynski等式的非平衡热力学结果,结合序贯蒙特卡洛采样工具,高效地执行这一计算,并避免标准对比散度算法中采用的不受控近似。具体而言,我们提出了一种对未调整Langevin算法(ULA)的改进,其中每个游走者获得一个权重,使得能够在梯度下降的任何步骤中估计交叉熵的梯度,从而绕过了由ULA缓慢混合引起的采样偏差。我们通过高斯混合分布以及MNIST数据集的数值实验验证了这些结果。结果表明,在所有考虑的情况下,所提出的方法均优于基于对比散度算法的方法。