We consider sampling from a Gibbs distribution by evolving a finite number of particles using a particular score estimator rather than Brownian motion. To accelerate the particles, we consider a second-order score-based ODE, similar to Nesterov acceleration. In contrast to traditional kernel density score estimation, we use the recently proposed regularized Wasserstein proximal method, yielding the Accelerated Regularized Wasserstein Proximal method (ARWP). We provide a detailed analysis of continuous- and discrete-time non-asymptotic and asymptotic mixing rates for Gaussian initial and target distributions, using techniques from Euclidean acceleration and accelerated information gradients. Compared with the kinetic Langevin sampling algorithm, the proposed algorithm exhibits a higher contraction rate in the asymptotic time regime. Numerical experiments are conducted across various low-dimensional experiments, including multi-modal Gaussian mixtures and ill-conditioned Rosenbrock distributions. ARWP exhibits structured and convergent particles, accelerated discrete-time mixing, and faster tail exploration than the non-accelerated regularized Wasserstein proximal method and kinetic Langevin methods. Additionally, ARWP particles exhibit better generalization properties for some non-log-concave Bayesian neural network tasks.
翻译:本文考虑通过演化有限数量的粒子进行吉布斯分布采样,其中粒子运动采用特定的得分估计器而非布朗运动。为加速粒子演化,我们引入类似Nesterov加速技术的二阶得分估计常微分方程。区别于传统的核密度得分估计方法,我们采用近期提出的正则化Wasserstein近端方法,由此构建加速正则化Wasserstein近端方法(ARWP)。针对高斯初始分布与目标分布,我们运用欧几里得加速与加速信息梯度技术,系统分析了连续时间与离散时间框架下的非渐近与渐近混合速率。相较于动力学朗之万采样算法,所提算法在渐近时间区域展现出更高的收缩速率。通过多模态高斯混合分布与病态Rosenbrock分布等低维实验进行数值验证,结果表明:相较于非加速正则化Wasserstein近端方法与动力学朗之万方法,ARWP具有结构化收敛的粒子轨迹、加速的离散时间混合特性以及更快的尾部探索能力。此外,在某些非对数凹贝叶斯神经网络任务中,ARWP粒子展现出更优的泛化特性。