This paper introduces a novel Bayesian approach for variable selection in high-dimensional and potentially sparse regression settings. Our method replaces the indicator variables in the traditional spike and slab prior with continuous, Beta-distributed random variables and places half Cauchy priors over the parameters of the Beta distribution, which significantly improves the predictive and inferential performance of the technique. Similar to shrinkage methods, our continuous parameterization of the spike and slab prior enables us explore the posterior distributions of interest using fast gradient-based methods, such as Hamiltonian Monte Carlo (HMC), while at the same time explicitly allowing for variable selection in a principled framework. We study the frequentist properties of our model via simulation and show that our technique outperforms the latest Bayesian variable selection methods in both linear and logistic regression. The efficacy, applicability and performance of our approach, are further underscored through its implementation on real datasets.
翻译:本文针对高维及潜在稀疏回归场景中的变量选择问题,提出了一种新颖的贝叶斯方法。该方法将传统尖峰-厚板先验中的指示变量替换为连续的、服从Beta分布的随机变量,并对Beta分布的参数施加半柯西先验,从而显著提升了该技术在预测与推断方面的性能。与收缩方法类似,我们对尖峰-厚板先验的连续参数化处理使得我们能够利用基于梯度的快速方法(如哈密顿蒙特卡洛)探索感兴趣的后验分布,同时在一个原则性框架内明确支持变量选择。我们通过仿真研究验证了所提模型的频率性质,结果表明在线性回归与逻辑回归任务中,本方法均优于最新的贝叶斯变量选择技术。通过在真实数据集上的应用,进一步凸显了本方法的有效性、适用性与优越性能。