Motivated by the CATHGEN data, we develop a new statistical learning method for simultaneous variable selection and parameter estimation under the context of generalized partly linear models for data with high-dimensional covariates. The method is referred to as the broken adaptive ridge (BAR) estimator, which is an approximation of the $L_0$-penalized regression by iteratively performing reweighted squared $L_2$-penalized regression. The generalized partly linear model extends the generalized linear model by including a non-parametric component to construct a flexible model for modeling various types of covariate effects. We employ the Bernstein polynomials as the sieve space to approximate the non-parametric functions so that our method can be implemented easily using the existing R packages. Extensive simulation studies suggest that the proposed method performs better than other commonly used penalty-based variable selection methods. We apply the method to the CATHGEN data with a binary response from a coronary artery disease study, which motivated our research, and obtained new findings in both high-dimensional genetic and low-dimensional non-genetic covariates.
翻译:受CATHGEN数据驱动,我们开发了一种新的统计学习方法,用于在高维协变量数据下实现广义部分线性模型中的同步变量选择与参数估计。该方法称为破碎自适应岭(BAR)估计器,通过迭代执行加权平方$L_2$惩罚回归来逼近$L_0$惩罚回归。广义部分线性模型通过引入非参数分量扩展了广义线性模型,从而构建能够建模多种协变量效应的灵活模型。我们采用伯恩斯坦多项式作为筛空间来近似非参数函数,使得该方法可借助现有R软件包轻松实现。大量模拟研究表明,所提方法优于其他常用基于惩罚的变量选择方法。我们将该方法应用于一项冠状动脉疾病研究中的二元响应CATHGEN数据(这也激发了本研究),并在高维遗传协变量与低维非遗传协变量中获得了新发现。