In this paper, we study sampling from a posterior derived from a neural network. We propose a new probabilistic model consisting of adding noise at every pre- and post-activation in the network, arguing that the resulting posterior can be sampled using an efficient Gibbs sampler. The Gibbs sampler attains similar performances as the state-of-the-art Monte Carlo Markov chain methods, such as the Hamiltonian Monte Carlo or the Metropolis adjusted Langevin algorithm, both on real and synthetic data. By framing our analysis in the teacher-student setting, we introduce a thermalization criterion that allows us to detect when an algorithm, when run on data with synthetic labels, fails to sample from the posterior. The criterion is based on the fact that in the teacher-student setting we can initialize an algorithm directly at equilibrium.
翻译:本文研究从神经网络推导的后验分布中进行采样的问题。我们提出了一种新的概率模型,通过在网络的每个激活前与激活后添加噪声,论证所得后验分布可利用高效的吉布斯采样器进行采样。该吉布斯采样器在真实数据与合成数据上均能达到与当前最先进的马尔可夫链蒙特卡洛方法(如哈密顿蒙特卡洛或Metropolis调整兰格万算法)相当的性能。通过将分析置于教师-学生框架中,我们引入了一个热化准则,使算法在处理带有合成标签的数据时,若未能从后验分布中采样,即可被该准则识别。该准则基于以下事实:在教师-学生框架中,我们可以直接在平衡状态下初始化算法。