Revisiting Bayesian Variable Selection via Optimization

Variable selection in linear regression has been a central topic in statistical research for decades. Bayesian variable selection methods, which account for uncertainty in both the regression coefficients and the noise variance, have achieved broad success through the use of discrete or continuous shrinkage priors and efficient collapsed Gibbs samplers. Despite their popularity and strong empirical performance, an enigma remains: the marginal likelihood, obtained by integrating out the regression coefficients and noise variance, is not log-concave; therefore, there is no guarantee of reliably finding its global optimum. In this article, we study this problem from an optimization perspective. Taking the negative log-marginal likelihood as a loss function of the latent precision parameters, we can rewrite it as a difference of convex functions (DC), and then optimize it via a simple iterative algorithm. Under mild compact set conditions, the DC algorithm converges to the global optimum at a linear rate. The positive finding applies to type-II maximum likelihood and extends to maximum marginal posterior under suitable priors, indicating that the problem of mode finding in Bayesian variable selection is much more benign than the lack of log-concavity might suggest. Besides the theoretical insight, the proposed algorithm is easy to implement, free of tuning, and extensible to structured sparsity, and thus can serve as an efficient alternative or warm-start for traditional Markov chain Monte Carlo solutions. The method is illustrated through numerical studies and a spatial data application for quantifying the aftershock risk following the 2019 Ridgecrest earthquakes. The source code for the algorithm is publicly available at https://github.com/leoduan/dca_optimization_variable_selection.

翻译：线性回归中的变量选择数十年来一直是统计研究的核心课题。贝叶斯变量选择方法通过使用离散或连续收缩先验以及高效的折叠吉布斯采样器，同时考虑回归系数和噪声方差的不确定性，取得了广泛成功。尽管该方法广受欢迎且实证表现优异，仍存在一个谜团：通过积分回归系数和噪声方差得到的边际似然函数并非对数凹性，因此无法保证可靠地找到其全局最优值。本文从优化视角研究该问题。将负对数边际似然作为隐式精度参数的损失函数，可将其重写为凸函数差值形式，并通过简单迭代算法进行优化。在温和的紧集条件下，该凸差算法以线性速率收敛至全局最优值。这一积极发现适用于第二类极大似然估计，并在适当先验下可扩展至最大边际后验估计，表明贝叶斯变量选择中的众数寻优问题远比缺乏对数凹性所暗示的更为良性。除理论洞见外，所提算法易于实现、无需调参且可扩展至结构化稀疏性，因此可作为传统马尔可夫链蒙特卡洛方法的高效替代或热启动方案。通过数值实验及量化2019年里奇克莱斯特地震余震风险的空间数据应用验证了该方法。算法源代码公开于https://github.com/leoduan/dca_optimization_variable_selection。