Log-Concave Coupling for Sampling Neural Net Posteriors

from arxiv, This research was presented at the International Symposium on Information Theory (ISIT). Athens, Greece, July 11, 2024. The material was also presented in the 2024 Shannon Lecture

In this work, we present a sampling algorithm for single hidden layer neural networks. This algorithm is built upon a recursive series of Bayesian posteriors using a method we call Greedy Bayes. Sampling of the Bayesian posterior for neuron weight vectors $w$ of dimension $d$ is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for $w$ with an auxiliary random variable $\xi$. The resulting reverse conditional $w|\xi$ of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on $w$ with suitably small variance, the resulting marginal density of the auxiliary variable $\xi$ is proven to be strictly log concave for all dimensions $d$. For a uniform prior on the unit $\ell_1$ ball, evidence is given that the density of $\xi$ is again strictly log concave for sufficiently large $d$. The score of the marginal density of the auxiliary random variable $\xi$ is determined by an expectation over $w|\xi$ and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of $\xi$ permits methods of sampling $\xi$ by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of $\xi$ is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from $w|\xi$, produces neuron weights $w$ whose marginal distribution is from the desired posterior.

翻译：本文提出了一种针对单隐层神经网络的采样算法。该算法基于我们称为"贪婪贝叶斯"的方法，通过递归构建贝叶斯后验序列实现。由于神经元权重向量$w$（维度为$d$）的贝叶斯后验具有多峰特性，其采样具有挑战性。我们解决该问题的算法基于后验密度$w$与辅助随机变量$\xi$的耦合构造。研究证明，给定辅助随机变量时神经元权重的反向条件分布$w|\xi$具有对数凹性。在后验分布的构建中，我们在先验选择方面提供了一定自由度。特别地，对于方差适当小的高斯先验$w$，我们证明辅助变量$\xi$的边际密度在所有维度$d$下均严格对数凹。对于单位$\ell_1$球上的均匀先验，有证据表明当$d$足够大时$\xi$的密度同样严格对数凹。辅助随机变量$\xi$的边际密度评分由$w|\xi$的期望确定，因此可通过多种快速混合的马尔可夫链蒙特卡洛方法计算。此外，$\xi$评分的计算允许通过随机扩散（朗之万动力学）方法采样$\xi$，其中漂移函数由该评分构建。在此动力学框架下，Bakry和Emery开创的信息论方法表明，当$\xi$密度严格对数凹时，可快速获得其精确采样。随后，从$w|\xi$中再进行一次抽取，即可获得边际分布符合目标后验的神经元权重$w$。