The Hamiltonian Monte Carlo (HMC) algorithm is often lauded for its ability to effectively sample from high-dimensional distributions. In this paper we challenge the presumed domination of HMC for the Bayesian analysis of GLMs. By utilizing the structure of the compute graph rather than the graphical model, we show a reduction of the time per sweep of a full-scan Gibbs sampler from $O(d^2)$ to $O(d)$, where $d$ is the number of GLM parameters. A simple change to the implementation of the Gibbs sampler allows us to perform Bayesian inference on high-dimensional GLMs that are practically infeasible with traditional Gibbs sampler implementations. We empirically demonstrate a substantial increase in effective sample size per time when comparing our Gibbs algorithms to state-of-the-art HMC algorithms. While Gibbs is superior in terms of dimension scaling, neither Gibbs nor HMC dominate the other: we provide numerical and theoretical evidence that HMC retains an edge in certain circumstances thanks to its advantageous condition number scaling. Interestingly, for GLMs of fixed data size, we observe that increasing dimensionality can stabilize or even decrease condition number, shedding light on the empirical advantage of our efficient Gibbs sampler.
翻译:哈密顿蒙特卡洛(HMC)算法常因其能有效采样高维分布而备受赞誉。本文对HMC在广义线性模型(GLM)贝叶斯分析中的主导地位提出质疑。通过利用计算图而非图模型的结构,我们将全扫描吉布斯采样器每轮扫描的时间复杂度从$O(d^2)$降低至$O(d)$,其中$d$为GLM参数数量。对吉布斯采样器实现方式的一个简单改进,使我们能够对高维GLM进行贝叶斯推断,而这在传统吉布斯采样器实现中实际上是不可行的。通过实验比较,我们证明相较于最先进的HMC算法,我们的吉布斯算法在单位时间内获得的有效样本量显著提升。尽管吉布斯采样在维度缩放方面更具优势,但两者均未完全主导对方:我们通过数值和理论证据表明,得益于其优越的条件数缩放特性,HMC在某些情况下仍保持优势。值得注意的是,对于固定数据量的GLM,我们观察到增加维度可能稳定甚至降低条件数,这揭示了我们的高效吉布斯采样器具有实证优势的内在原因。