Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains

Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost'' can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincar\'e or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincar\'e constant. In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator $\mathcal{L}$ and a generalized score matching loss that tries to fit $\frac{\mathcal{O} p}{p}$. If $\mathcal{L}$ corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song and Ermon 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in $d$ dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincar\'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022. This is the first result characterizing the benefits of annealing for score matching -- a crucial component in more sophisticated score-based approaches like Song and Ermon 2019.

翻译：得分匹配是一种学习比例常数参数化概率分布（例如基于能量的模型）的方法。其思想是拟合分布的得分，而非似然，从而避免评估比例常数的需要。虽然这种方法具有明显的算法优势，但统计“代价”可能很高：近期Koehler等人（2022）的研究表明，对于等周性质较差（具有较大庞加莱或对数-索博列夫常数）的分布，得分匹配在统计效率上显著低于最大似然估计。然而，许多自然的现实分布——例如简单如一维双高斯混合的多峰分布——具有较差的庞加莱常数。本文揭示了具有生成元$\mathcal{L}$的任意马尔可夫过程的混合时间与试图拟合$\frac{\mathcal{O} p}{p}$的广义得分匹配损失之间的密切关联。若$\mathcal{L}$对应于连续模拟退火的马尔可夫过程，我们证明相应的广义得分匹配损失即为高斯卷积退火得分匹配损失，类似于Song和Ermon（2019）提出的方法。此外，我们证明：若被学习的分布是具有共享协方差的$d$维有限高斯混合，则退火得分匹配的样本复杂度关于环境维度、均值直径以及协方差的最小与最大特征值呈多项式增长——从而规避了Koehler等人（2022）所示基本得分匹配损失的庞加莱常数下界。这是首个刻画退火对得分匹配效益特征的结果——该技术是Song和Ermon（2019）等复杂基于得分方法的关键组成部分。