Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost'' can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincar\'e or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincar\'e constant. In this paper, we show a close connection between the mixing time of a broad class of Markov processes with generator $\mathcal{L}$ and an appropriately chosen generalized score matching loss that tries to fit $\frac{\mathcal{O} p}{p}$. This allows us to adapt techniques to speed up Markov chains to construct better score-matching losses. In particular, ``preconditioning'' the diffusion can be translated to an appropriate ``preconditioning'' of the score loss. Lifting the chain by adding a temperature like in simulated tempering can be shown to result in a Gaussian-convolution annealed score matching loss, similar to Song and Ermon, 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in $d$ dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter of the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincar\'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022.
翻译:分数匹配是一种学习比例常数未知的概率分布(例如基于能量的模型)的方法。其核心思想是拟合分布的分数(score)而非似然,从而避免计算比例常数。尽管该方法具有明确的算法优势,但统计“代价”可能很高:Koehler等人(2022)的最新研究表明,对于等周性质较差(即庞加莱常数或对数索博列夫常数较大)的分布,分数匹配的统计效率显著低于最大似然估计。然而,许多自然存在的现实分布(例如简单的一维双高斯混合分布)的庞加莱常数往往较大。本文揭示了生成算子为$\mathcal{L}$的一类广泛马尔可夫过程的混合时间与适当选取的广义分数匹配损失(旨在拟合$\frac{\mathcal{O} p}{p}$)之间的紧密联系。这使我们能够借鉴加速马尔可夫链的技术来构建更优的分数匹配损失函数。具体而言,扩散过程的“预处理”可转化为分数损失的相应“预处理”。通过添加温度参数(类似模拟退火)对链进行提升,可得到类似于Song和Ermon(2019)提出的高斯卷积退火分数匹配损失。此外,我们证明:当待学习分布为$d$维共享协方差的高斯混合模型时,退火分数匹配的样本复杂度与空间维度、均值直径以及协方差的最小/最大特征值呈多项式关系——这规避了Koehler等人(2022)针对基本分数匹配损失所证明的基于庞加莱常数的下界约束。