Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.
翻译:基于能量的模型是一类简洁而强大的概率模型,但其广泛应用受限于训练过程中的计算负担。我们提出了一种名为能量差异(Energy Discrepancy, ED)的新型损失函数,该函数无需计算分数或昂贵的马尔可夫链蒙特卡洛方法。我们证明了能量差异在不同极限条件下可以逼近显式分数匹配与负对数似然损失,从而有效在两者之间进行插值。因此,最小化能量差异估计克服了基于分数估计方法中的短视问题,同时享有理论保证。通过数值实验,我们证明能量差异学习低维数据分布的速度和准确性均优于显式分数匹配或对比散度。针对高维图像数据,我们描述了流形假设如何限制本方法的适用性,并通过将基于能量的模型作为变分解码器模型的先验进行训练,证明了能量差异的有效性。