We study Bayesian group-regularized estimation in high-dimensional generalized linear models (GLMs) under a continuous spike-and-slab prior. Our framework covers both canonical and non-canonical link functions and subsumes logistic, Poisson, negative binomial, and Gaussian regression with group sparsity. We obtain the minimax L2 convergence rate for both a maximum a posteriori (MAP) estimator and the full posterior distribution under our prior. Our theoretical results thus justify the use of the posterior mode as a point estimator. The posterior distribution also contracts at the same rate as the MAP estimator, an attractive feature of our approach which is not the case for the group lasso. For computation, we propose expectation-maximization (EM) and Markov chain Monte Carlo (MCMC) algorithms. We illustrate our method through simulations and a real data application on predicting human immunodeficiency virus (HIV) drug resistance from protein sequences.
翻译:本文研究高维广义线性模型(GLMs)中基于连续尖峰-厚板先验的贝叶斯组正则化估计。我们的框架涵盖典型与非典型连接函数,包含具有组稀疏性的逻辑回归、泊松回归、负二项回归及高斯回归。在该先验下,我们获得了最大后验概率(MAP)估计量与完整后验分布的最小化最大L2收敛速率。因此,我们的理论结果证明了使用后验众数作为点估计量的合理性。后验分布以与MAP估计量相同的速率收缩,这是我们方法的一个吸引人的特性,而组LASSO并不具备这一特性。在计算方面,我们提出了期望最大化(EM)算法和马尔可夫链蒙特卡洛(MCMC)算法。我们通过模拟实验以及一个基于蛋白质序列预测人类免疫缺陷病毒(HIV)耐药性的实际数据应用来展示我们的方法。