We propose a unified framework to draw inferences for regression coefficients in a generalized linear model (GLM) following Lasso-based variable selection. We adapt to non-Gaussian GLMs a recently developed parametric programming strategy for post-selection inference in the linear model with a Gaussian response by drawing parallels between maximum likelihood estimation in GLMs and least squares estimation in linear models. We then conduct post-selection inference based on a linearized model for pseudo response and covariate data strategically created based on the raw data. Using synthetic data generated from regression models for three different types of non-Gaussian responses in simulation experiments, we demonstrate that the proposed method effectively corrects the naive inference that ignores variable selection while achieving greater efficiency than a polyhedral-based post-selection adjustment.
翻译:我们提出了一个统一框架,用于在基于Lasso变量选择后对广义线性模型中的回归系数进行推断。通过将广义线性模型中的最大似然估计与线性模型中的最小二乘估计进行类比,我们将一种最近开发的、适用于高斯响应线性模型选择后推断的参数规划策略推广到非高斯广义线性模型。然后,基于根据原始数据策略性构建的伪响应和协变量数据的线性化模型进行选择后推断。在仿真实验中,使用从三种不同类型非高斯响应的回归模型生成的合成数据,我们证明所提出的方法能够有效纠正忽略变量选择的朴素推断,同时比基于多面体的选择后调整方法具有更高的效率。