Score-based diffusion models have become a powerful framework for generative modeling, with score estimation as a central statistical bottleneck. Existing guarantees for score estimation largely focus on light-tailed targets or rely on restrictive assumptions such as compact support, which are often violated by heavy-tailed data in practice. In this work, we study conventional (Gaussian) score-based diffusion models when the target distribution is heavy-tailed and belongs to a Sobolev class with smoothness parameter $β>0$. We consider both exponential and polynomial tail decay, indexed by a tail parameter $γ$. Using kernel density estimation, we derive sharp minimax rates for score estimation, revealing a qualitative dichotomy: under exponential tails, the rate matches the light-tailed case up to polylogarithmic factors, whereas under polynomial tails the rate depends explicitly on $γ$. We further provide sampling guarantees for the associated continuous reverse dynamics. In total variation, the generated distribution converges at the minimax optimal rate $n^{-β/(2β+d)}$ under exponential tails (up to logarithmic factors), and at a $γ$-dependent rate under polynomial tails. Whether the latter sampling rate is minimax optimal remains an open question. These results characterize the statistical limits of score estimation and the resulting sampling accuracy for heavy-tailed targets, extending diffusion theory beyond the light-tailed setting.
翻译:基于得分的扩散模型已成为生成建模的强大框架,其中得分估计是核心的统计瓶颈。现有关于得分估计的理论保证主要集中于轻尾目标,或依赖于诸如紧支撑等限制性假设,这些假设在实践中常被重尾数据所违背。本研究探讨当目标分布具有重尾特性且属于光滑参数为$β>0$的Sobolev类时,传统(高斯)基于得分的扩散模型的表现。我们同时考虑以尾部参数$γ$为指标的指数型与多项式型尾部衰减。通过核密度估计方法,我们推导出得分估计的尖锐极小极大速率,揭示了一个定性二分现象:在指数型尾部条件下,该速率与轻尾情形一致(至多相差多对数因子);而在多项式型尾部条件下,该速率明确依赖于$γ$。我们进一步为关联的连续反向动力学提供了采样保证。在总变差距离下,生成分布在指数型尾部条件下以极小极大最优速率$n^{-β/(2β+d)}$收敛(至多含对数因子),在多项式型尾部条件下则以依赖$γ$的速率收敛。后一种采样速率是否达到极小极大最优仍是一个开放性问题。这些结果刻画了重尾目标下得分估计的统计极限及其对应的采样精度,将扩散理论扩展至轻尾设定之外。