Energy-based models are a simple yet powerful class of probabilistic models, but their widespread adoption has been limited by the computational burden of training them. We propose a novel loss function called Energy Discrepancy (ED) which does not rely on the computation of scores or expensive Markov chain Monte Carlo. We show that ED approaches the explicit score matching and negative log-likelihood loss under different limits, effectively interpolating between both. Consequently, minimum ED estimation overcomes the problem of nearsightedness encountered in score-based estimation methods, while also enjoying theoretical guarantees. Through numerical experiments, we demonstrate that ED learns low-dimensional data distributions faster and more accurately than explicit score matching or contrastive divergence. For high-dimensional image data, we describe how the manifold hypothesis puts limitations on our approach and demonstrate the effectiveness of energy discrepancy by training the energy-based model as a prior of a variational decoder model.
翻译:基于能量的模型是一类简洁而强大的概率模型,但训练过程中的计算负担限制了其广泛应用。我们提出了一种名为能量差异(Energy Discrepancy, ED)的新型损失函数,该函数无需计算分数或昂贵的马尔可夫链蒙特卡罗方法。研究表明,能量差异在不同极限下分别趋近于显式分数匹配和负对数似然损失,从而有效实现了两者之间的插值。因此,基于最小能量差异的估计方法既克服了基于分数估计方法中的“近视”问题,又具有理论保证。通过数值实验,我们证明能量差异在低维数据分布学习中的速度和准确性均优于显式分数匹配或对比散度方法。针对高维图像数据,我们分析了流形假设对方法的局限性,并通过将基于能量的模型作为变分解码器模型的先验进行训练,验证了能量差异的有效性。