We study the problem of estimating the score function of an unknown probability distribution $\rho^*$ from $n$ independent and identically distributed observations in $d$ dimensions. Assuming that $\rho^*$ is subgaussian and has a Lipschitz-continuous score function $s^*$, we establish the optimal rate of $\tilde \Theta(n^{-\frac{2}{d+4}})$ for this estimation problem under the loss function $\|\hat s - s^*\|^2_{L^2(\rho^*)}$ that is commonly used in the score matching literature, highlighting the curse of dimensionality where sample complexity for accurate score estimation grows exponentially with the dimension $d$. Leveraging key insights in empirical Bayes theory as well as a new convergence rate of smoothed empirical distribution in Hellinger distance, we show that a regularized score estimator based on a Gaussian kernel attains this rate, shown optimal by a matching minimax lower bound. We also discuss extensions to estimating $\beta$-H\"older continuous scores with $\beta \leq 1$, as well as the implication of our theory on the sample complexity of score-based generative models.
翻译:我们研究从$d$维空间中$n$个独立同分布观测值估计未知概率分布$\rho^*$的得分函数的问题。假设$\rho^*$是亚高斯的且具有Lipschitz连续得分函数$s^*$,我们在得分匹配文献常用的损失函数$\|\hat s - s^*\|^2_{L^2(\rho^*)}$下,建立了该估计问题的最优收敛速率$\tilde \Theta(n^{-\frac{2}{d+4}})$,这揭示了维度灾难现象:精确得分估计所需的样本量随维度$d$呈指数级增长。借助经验贝叶斯理论的关键洞见以及平滑经验分布在Hellinger距离中的新收敛速率,我们证明基于高斯核的正则化得分估计器能够达到该速率,并通过匹配的极小极大下界验证了其最优性。我们还讨论了将方法扩展至估计$\beta$-H\"older连续得分函数($\beta \leq 1$)的情形,以及我们理论对基于得分的生成模型样本复杂度的意义。