We derive a Markov Chain Monte Carlo sampler based on following ray paths in a medium where the refractive index $n(x)$ is a function of the desired likelihood $\mathcal{L}(x)$. The sampling method propagates rays at constant speed through parameter space, leading to orders of magnitude higher resilience to heating for stochastic gradients as compared to Hamiltonian Monte Carlo (HMC), as well as the ability to cross any likelihood barrier, including holes in parameter space. Using the ray tracing method, we sample the posterior distributions of neural network outputs for a variety of different architectures, up to the 1.5 billion-parameter GPT-2 (Generative Pre-trained Transformer 2) architecture, all on a single consumer-level GPU. We also show that prior samplers including traditional HMC, microcanonical HMC, Metropolis, Gibbs, and even Monte Carlo integration are special cases within a generalized ray tracing framework, which can sample according to an arbitrary weighting function. Public code and documentation for C, JAX, and PyTorch are available at https://bitbucket.org/pbehroozi/ray-tracing-sampler/src
翻译:我们提出了一种基于光线路径追踪的马尔可夫链蒙特卡洛采样器,其介质折射率$n(x)$为期望似然函数$\mathcal{L}(x)$的映射函数。该采样方法使光线在参数空间中以恒定速度传播,相较于哈密顿蒙特卡洛方法,对随机梯度退化的抵抗能力提升了数个数量级,且具备跨越任意似然屏障(包括参数空间中的空洞区域)的能力。通过该光线追踪方法,我们在单个消费级GPU上对多种神经网络架构的后验分布进行了采样,包括参数量达15亿的GPT-2(生成式预训练Transformer 2)架构。研究同时表明,传统哈密顿蒙特卡洛、微正则哈密顿蒙特卡洛、Metropolis采样、Gibbs采样乃至蒙特卡洛积分等现有采样方法,均可视为广义光线追踪框架中针对任意加权函数的特例。C、JAX与PyTorch版本的公开代码及文档详见:https://bitbucket.org/pbehroozi/ray-tracing-sampler/src