Sampling from high-dimensional, non-log-concave distributions with unnormalized densities is a fundamental challenge in machine learning, particularly when the exact gradient of the potential is unavailable and must be approximated via stochastic gradients that exhibit high variance under a fixed budget of gradient computations per iteration. Although variance reduction techniques such as SGD with momentum, STORM, and PAGE have demonstrated improved convergence properties in non-convex optimization, their implications for sampling from non-log-concave distributions remain largely unexplored. In this work, we develop the first unified analysis of these estimators for sampling from non-log-concave distributions. We establish improved non-asymptotic convergence rates in $\varepsilon$-relative Fisher information and, under a Poincaré inequality assumption, in squared total variation distance, and further prove weak convergence to the target distribution. We extend our analysis to solving inverse problems with score-based generative priors. We empirically validate our theory and demonstrate that, under a fixed gradient computations per iteration, variance-reduction techniques consistently improve sample quality in two standard imaging applications.
翻译:[translated abstract in Chinese]
从具有未归一化密度的高维非对数凹分布中采样是机器学习领域的一项基本挑战,尤其在势函数精确梯度不可获取,且需通过每轮迭代中固定梯度计算预算下呈现高方差的随机梯度进行近似时更为突出。尽管动量随机梯度下降法(SGD with momentum)、STORM和PAGE等方差缩减技术已在非凸优化中展现出改进的收敛特性,但其对非对数凹分布采样的影响仍未得到充分探索。本研究首次针对非对数凹分布采样中这些估计器建立了统一分析框架。我们获得了关于ε-相对Fisher信息和非渐近收敛速率的改进结果,并在庞加莱不等式假设下,进一步建立了关于平方总变差距离的收敛性及目标分布的弱收敛性。我们将分析拓展至基于分数生成先验的逆问题求解。通过实证验证理论,我们证明在每轮迭代固定梯度计算预算下,方差缩减技术可持续提升两项标准成像应用中的样本质量。