Fit Like You Sample: Sample-Efficient Generalized Score Matching from Fast Mixing Markov Chains

Score matching is an approach to learning probability distributions parametrized up to a constant of proportionality (e.g. Energy-Based Models). The idea is to fit the score of the distribution, rather than the likelihood, thus avoiding the need to evaluate the constant of proportionality. While there's a clear algorithmic benefit, the statistical "cost'' can be steep: recent work by Koehler et al. 2022 showed that for distributions that have poor isoperimetric properties (a large Poincar\'e or log-Sobolev constant), score matching is substantially statistically less efficient than maximum likelihood. However, many natural realistic distributions, e.g. multimodal distributions as simple as a mixture of two Gaussians in one dimension -- have a poor Poincar\'e constant. In this paper, we show a close connection between the mixing time of an arbitrary Markov process with generator $\mathcal{L}$ and an appropriately chosen generalized score matching loss that tries to fit $\frac{\mathcal{O} p}{p}$. If $\mathcal{L}$ corresponds to a Markov process corresponding to a continuous version of simulated tempering, we show the corresponding generalized score matching loss is a Gaussian-convolution annealed score matching loss, akin to the one proposed in Song and Ermon 2019. Moreover, we show that if the distribution being learned is a finite mixture of Gaussians in $d$ dimensions with a shared covariance, the sample complexity of annealed score matching is polynomial in the ambient dimension, the diameter the means, and the smallest and largest eigenvalues of the covariance -- obviating the Poincar\'e constant-based lower bounds of the basic score matching loss shown in Koehler et al. 2022. This is the first result characterizing the benefits of annealing for score matching -- a crucial component in more sophisticated score-based approaches like Song and Ermon 2019.

翻译：得分匹配是一种学习概率分布（如基于能量的模型）的方法，这些分布由一个比例常数参数化。其思想是拟合分布的得分而非似然，从而避免计算该比例常数的需要。尽管有明确的算法优势，但统计“代价”可能很高：Koehler等人（2022年）的最新研究表明，对于等周性质较差（具有较大的庞加莱常数或对数索博列夫常数）的分布，得分匹配在统计上远不如最大似然有效。然而，许多自然的现实分布，例如简单如一维两个高斯分布的混合分布——也具有较差的庞加莱常数。在本文中，我们展示了任意生成元为$\mathcal{L}$的马尔可夫过程的混合时间与适当选择的、旨在拟合$\frac{\mathcal{O} p}{p}$的广义得分匹配损失之间的紧密联系。如果$\mathcal{L}$对应于一个模拟退火连续版本的马尔可夫过程，我们证明相应的广义得分匹配损失是高斯卷积退火得分匹配损失，类似于Song和Ermon（2019年）提出的损失。此外，我们表明，如果被学习的分布是具有共享协方差的$d$维有限高斯混合分布，则退火得分匹配的样本复杂度在环境维度、均值直径以及协方差的最小和最大特征值上是多项式的——这避免了Koehler等人（2022年）显示的基本得分匹配损失基于庞加莱常数的下界。这是首个表征退火对得分匹配益处的结果——该技术是Song和Ermon（2019年）等更复杂得分方法的关键组成部分。