We develop a fast and accurate grouped penalized credible region approach for variable selection and prediction in Bayesian high-dimensional linear regression. Most existing Bayesian methods either are subject to high computational costs due to long Markov Chain Monte Carlo runs or yield ambiguous variable selection results due to non-sparse solution output. The penalized credible region framework yields sparse post-processed estimates that facilitates unambiguous grouped variable selection. High estimation accuracy is achieved by shrinking noise from unimportant groups using a grouped global-local shrinkage prior. To ensure computational scalability, we approximate posterior summaries using coordinate ascent variational inference and recast the penalized credible region framework as a convex optimization problem that admits efficient computations. We prove that the resultant post-processed estimators are both parameter-consistent and variable selection consistent in high-dimensional settings. Theory is developed to justify running the coordinate ascent algorithm for at least two cycles. Through extensive simulations, we demonstrate that our proposed method outperforms state-of-the-art methods in grouped variable selection, prediction, and computation time for several common models including ANOVA and nonparametric varying coefficient models.
翻译:本文针对贝叶斯高维线性回归中的变量选择与预测问题,提出了一种快速精确的分组惩罚可信区域方法。现有贝叶斯方法大多因需要长链马尔可夫链蒙特卡罗模拟而计算成本高昂,或因输出非稀疏解导致变量选择结果模糊。惩罚可信区域框架通过生成稀疏后处理估计量,实现了明确的分组变量选择。该方法采用分组全局-局部收缩先验压缩不重要组的噪声,从而获得较高的估计精度。为保证计算可扩展性,我们使用坐标上升变分推断近似后验统计量,并将惩罚可信区域框架重构为可高效求解的凸优化问题。我们证明了所得后处理估计量在高维设定下兼具参数一致性与变量选择一致性。理论分析表明坐标上升算法至少需要运行两个循环周期。通过大量模拟实验,我们证明所提方法在ANOVA和非参数变系数模型等多种常见模型中,其分组变量选择性能、预测精度与计算时间均优于现有先进方法。