Sparse regression based on global-local shrinkage priors are increasingly used for Bayesian modeling of modern high-dimensional data, but scaling up the Gibbs sampler for posterior inference remains a challenge. While much effort has gone into speeding up the high-dimensional coefficient update step, insufficient attention has been given to the potential poor mixing of the global scale parameter $τ$ and of the overall sampler. One proposed remedy has been to marginalize out the coefficients when updating $τ$. Here we show that, while this collapsed update was previously thought to require a Metropolis step, we can in fact sample directly and efficiently from the collapsed density. This is made possible by careful linear algebraic manipulations and a strategic per-Gibbs-scan spectral decomposition, allowing subsequent evaluations of the collapsed density across hundreds of values of $τ$ at negligible cost. We combine this computational trick with adaptive numerical integration and inverse transform sampling to construct a direct sampler. This eliminates the need to tune Metropolis proposals and yields faster convergence and improved mixing. We demonstrate our method on two big data applications, fitting logistic regression under the horseshoe prior to datasets with design matrices of size 120,000 x 1,379 and 1,980 x 17,848.
翻译:基于全局-局部收缩先验的稀疏回归日益广泛用于现代高维数据的贝叶斯建模,但扩展Gibbs采样器进行后验推断仍面临挑战。尽管大量研究致力于加速高维系数更新步骤,但对全局尺度参数τ及整个采样器可能存在的混合不良问题关注不足。已有解决方案建议在更新τ时边际化系数,但本文证明:此前认为这种压缩更新需借助Metropolis步骤,实际上我们可直接高效地从压缩密度中采样。这一突破得益于精细的线性代数运算与每次Gibbs扫描中策略性的谱分解,使得后续以可忽略的计算代价评估数百个τ值对应的压缩密度成为可能。我们将该计算技巧与自适应数值积分和逆变换采样相结合,构建直接采样器,从而消除对Metropolis提议调参的需求,实现更快的收敛性与改进的混合效果。本文在两个大规模数据应用中验证方法有效性:分别对设计矩阵为120,000×1,379和1,980×17,848的数据集,在horseshoe先验下拟合逻辑回归模型。