We study Bayesian group-regularized estimation in high-dimensional generalized linear models (GLMs) under a continuous spike-and-slab prior. Our framework covers both canonical and non-canonical link functions and subsumes logistic regression, Poisson regression, Gaussian regression, and negative binomial regression with group sparsity. Under milder assumptions than those previously assumed for the group lasso, we obtain the convergence rate for both the maximum a posteriori (MAP) estimator and the full posterior distribution. Our theoretical results thus justify the use of the posterior mode as a point estimator. Furthermore, the posterior distribution contracts at the same rate as the MAP estimator, an attractive feature of our approach which is not the case for the group lasso. For computation, we propose an expectation-maximization (EM) algorithm for rapidly obtaining MAP estimates under our model. We illustrate our method through simulations and a real data application on predicting human immunodeficiency virus (HIV) drug resistance from protein sequences.
翻译:我们研究了在连续尖峰-板先验下,高维广义线性模型中的贝叶斯分组正则化估计。该框架涵盖了典型与非典型连接函数,并包含了具有组稀疏性的逻辑回归、泊松回归、高斯回归和负二项回归。在比组套索假设更温和的条件下,我们获得了最大后验估计量和完整后验分布的收敛速率。因此,我们的理论结果证明了后验众数作为点估计量的合理性。此外,后验分布与最大后验估计量以相同速率收缩,这是本方法的一个吸引人的特征,而组套索则不具备这一点。在计算方面,我们提出了一种期望最大化算法,用于快速获得模型下的最大后验估计。通过模拟实验和一项基于蛋白质序列预测人类免疫缺陷病毒药物耐药性的实际数据应用,我们展示了该方法的效果。