Generalized linear model or GLM constitutes a large class of models and essentially extends the ordinary linear regression by connecting the mean of the response variable with the covariate through appropriate link functions. On the other hand, Lasso is a popular and easy-to-implement penalization method in regression when not all covariates are relevant. However, the asymptotic distributional properties the Lasso estimator in GLM is still unknown. In this paper, we show that the Lasso estimator in GLM does not have a tractable form and subsequently, we develop two Bootstrap methods, namely the Perturbation Bootstrap and Pearson's Residual Bootstrap methods, for approximating the distribution of the Lasso estimator in GLM. As a result, our Bootstrap methods can be used to draw valid statistical inferences for any sub-model of GLM. We support our theoretical findings by showing good finite-sample properties of the proposed Bootstrap methods through a moderately large simulation study. We also implement one of our Bootstrap methods on a real data set.
翻译:广义线性模型(GLM)构成了一个庞大的模型类别,本质上通过适当的连接函数将响应变量的均值与协变量相关联,从而扩展了普通线性回归。另一方面,当并非所有协变量都相关时,Lasso是一种流行且易于实现的回归惩罚方法。然而,GLM中Lasso估计量的渐近分布性质仍然未知。本文证明了GLM中的Lasso估计量不具有易处理的解析形式,随后我们开发了两种自举方法——扰动自举法和皮尔逊残差自举法——用于近似GLM中Lasso估计量的分布。因此,我们的自举方法可用于对GLM的任何子模型进行有效的统计推断。通过中等规模的模拟研究,我们展示了所提自举方法良好的有限样本性质,从而支持了理论发现。我们还将其中一种自举方法应用于实际数据集。