We consider sampling from a Gibbs distribution by evolving finitely many particles. We propose a preconditioned version of a recently proposed noise-free sampling method, governed by approximating the score function with the numerically tractable score of a regularized Wasserstein proximal operator. This is derived by a Cole--Hopf transformation on coupled anisotropic heat equations, yielding a kernel formulation for the preconditioned regularized Wasserstein proximal. The diffusion component of the proposed method is also interpreted as a modified self-attention block, as in transformer architectures. For quadratic potentials, we provide a discrete-time non-asymptotic convergence analysis and explicitly characterize the bias, which is dependent on regularization and independent of step-size. Experiments demonstrate acceleration and particle-level stability on various log-concave and non-log-concave toy examples to Bayesian total-variation regularized image deconvolution, and competitive/better performance on non-convex Bayesian neural network training when utilizing variable preconditioning matrices.
翻译:我们考虑通过演化有限多个粒子从吉布斯分布中采样的问题。本文针对近期提出的一种无噪声采样方法,提出了其预条件版本,该方法通过使用正则化Wasserstein邻近算子的数值可处理得分来近似得分函数。该算子通过耦合各向异性热方程上的Cole-Hopf变换导出,从而得到预条件正则化Wasserstein邻近的核函数形式。所提方法的扩散分量也可解释为修正的自注意力模块,类似于Transformer架构。对于二次势能,我们提供了离散时间的非渐近收敛性分析,并明确刻画了偏差,该偏差依赖于正则化而与步长无关。实验表明,该方法在从各种对数凹和非对数凹玩具示例到贝叶斯全变差正则化图像反卷积中均实现了加速与粒子级稳定性,并在使用可变预条件矩阵的非凸贝叶斯神经网络训练中展现出具有竞争力/更优的性能。