In this paper, we study sampling from a posterior derived from a neural network. We propose a new probabilistic model consisting of adding noise at every pre- and post-activation in the network, arguing that the resulting posterior can be sampled using an efficient Gibbs sampler. For small models, the Gibbs sampler attains similar performances as the state-of-the-art Markov chain Monte Carlo (MCMC) methods, such as the Hamiltonian Monte Carlo (HMC) or the Metropolis adjusted Langevin algorithm (MALA), both on real and synthetic data. By framing our analysis in the teacher-student setting, we introduce a thermalization criterion that allows us to detect when an algorithm, when run on data with synthetic labels, fails to sample from the posterior. The criterion is based on the fact that in the teacher-student setting we can initialize an algorithm directly at equilibrium.
翻译:本文研究从神经网络推导的后验分布中采样的方法。我们提出了一种新的概率模型,通过在网络的每个激活前后添加噪声,并论证由此产生的后验分布可通过高效的吉布斯采样器进行采样。对于小规模模型,该吉布斯采样器在真实数据与合成数据上均能取得与当前最先进马尔可夫链蒙特卡洛方法(如哈密顿蒙特卡洛或Metropolis调整Langevin算法)相当的性能。通过将分析置于教师-学生框架下,我们引入了一个热化判据,该判据能够检测算法在具有合成标签的数据上运行时是否未能从后验中采样。该判据基于以下事实:在教师-学生设定中,我们可以直接将算法初始化于平衡态。