Sampling from the posterior is a key technical problem in Bayesian statistics. Rigorous guarantees are difficult to obtain for Markov Chain Monte Carlo algorithms of common use. In this paper, we study an alternative class of algorithms based on diffusion processes. The diffusion is constructed in such a way that, at its final time, it approximates the target posterior distribution. The stochastic differential equation that defines this process is discretized (using a Euler scheme) to provide an efficient sampling algorithm. Our construction of the diffusion is based on the notion of observation process and the related idea of stochastic localization. Namely, the diffusion process describes a sample that is conditioned on increasing information. An overlapping family of processes was derived in the machine learning literature via time-reversal. We apply this method to posterior sampling in the high-dimensional symmetric spiked model. We observe a rank-one matrix ${\boldsymbol \theta}{\boldsymbol \theta}^{\sf T}$ corrupted by Gaussian noise, and want to sample ${\boldsymbol \theta}$ from the posterior. Our sampling algorithm makes use of an oracle that computes the posterior expectation of ${\boldsymbol \theta}$ given the data and the additional observation process. We provide an efficient implementation of this oracle using approximate message passing. We thus develop the first sampling algorithm for this problem with approximation guarantees.
翻译:后验采样是贝叶斯统计中的一个关键技术难题。对于常用的马尔可夫链蒙特卡洛算法,严格的理论保证难以获得。本文研究了一类基于扩散过程的替代算法。该扩散过程以特定方式构建,使其在最终时刻能够近似目标后验分布。定义该过程的随机微分方程通过欧拉格式离散化,从而提供高效的采样算法。我们的扩散过程构建基于观测过程的概念及随机定位的相关思想。具体而言,扩散过程描述了在逐渐增多信息条件下被约束的样本。机器学习文献中通过时间反转推导出了一族与之重叠的过程。我们将该方法应用于高维对称尖峰模型的后验采样。我们观测到受高斯噪声污染的秩一矩阵 ${\boldsymbol \theta}{\boldsymbol \theta}^{\sf T}$,并希望从后验分布中采样 ${\boldsymbol \theta}$。我们的采样算法利用了一个可以计算给定数据和附加观测过程时 ${\boldsymbol \theta}$ 后验期望的预言机。通过近似消息传递,我们为该预言机提供了高效实现。由此,我们首次开发了具有近似保证的该问题采样算法。