Penalized linear regression is of fundamental importance in high-dimensional statistics and has been routinely used to regress a response on a high-dimensional set of predictors. In many scientific applications, there exists external information that encodes the predictive power and sparsity structure of the predictors. In this article, we propose the Structure Adaptive Elastic-Net (SA-Enet), which provides a new framework for incorporating potentially useful side information into a penalized regression. The basic idea is to translate the external information into different penalization strengths for the regression coefficients. We particularly focus on group and covariate-dependent structures and study the risk properties of the resulting estimator. To this, we generalize the state evolution framework recently introduced for the analysis of the approximate message-passing algorithm to the SA-Enet framework. We show that the finite sample risk of the SA-Enet estimator is consistent with the theoretical risk predicted by the state evolution equation. Our theory suggests that the SA-Enet with an informative group or covariate structure can outperform the Lasso, Adaptive Lasso, Sparse Group Lasso, Feature-weighted Elastic-Net, and Graper. This evidence is further confirmed in our numerical studies. We also demonstrate the usefulness and the superiority of our method for leukemia data from molecular biology and precision medicine.
翻译:惩罚线性回归在高维统计学中具有基础重要性,常被用于将响应变量回归到一组高维预测变量上。在许多科学应用中,存在编码预测变量预测能力和稀疏结构的外部信息。本文提出结构自适应弹性网络(SA-Enet),它为将潜在有用的辅助信息纳入惩罚回归提供了新框架。基本思想是将外部信息转化为回归系数的不同惩罚强度。我们特别关注组和协变量依赖结构,并研究所得估计量的风险性质。为此,我们将近期为分析近似消息传递算法而引入的状态演化框架推广至SA-Enet框架。研究表明,SA-Enet估计量的有限样本风险与状态演化方程预测的理论风险一致。我们的理论表明,具有信息性组或协变量结构的SA-Enet可优于Lasso、自适应Lasso、稀疏组Lasso、特征加权弹性网络和Graper。数值研究进一步证实了这一证据。我们还通过分子生物学和精准医学中的白血病数据证明了我们方法的实用性和优越性。