Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.
翻译:针对分组Lasso估计量,本文发展了适用于广泛分布族和损失函数的选择性推断方法。该方法涵盖指数族分布,以及用于过离散计数数据的拟似然建模,同时支持分类/分组协变量和连续协变量。研究了一种随机化分组正则化优化问题。添加的随机化使我们能够构建一个后选择似然函数,并证明当以分组协变量选择事件为条件时,该似然足以进行选择性推断。该似然还提供了考虑分组Lasso选择效应的选择性点估计量。所选模型中回归参数的置信域采用Wald型区域形式,并证明其具有有界体积。通过全国健康与营养调查数据展示了分组Lasso选择性推断方法的应用,模拟实验则展示了其性能及相对于其他方法的优越性。