Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.
翻译:针对广义分布族和损失函数下的组套索估计量,本文发展了选择性推断方法。该方法不仅涵盖指数族分布,还支持过离散计数数据的拟似然建模,且可处理分类/分组协变量及连续型协变量。我们研究了一种随机化组正则化优化问题,通过引入随机化机制构建了选择后似然函数,证明该函数在以分组协变量选择为条件时适用于选择性推断。该似然函数还提供了考虑组套索选择效应的选择性点估计量。所选模型中回归参数的置信区域采用Wald型区域形式,并证明其具有有界体积。本文通过美国国家健康与营养调查数据展示了分组套索选择性推断方法的应用,同时通过模拟实验验证了其性能优势及与其他方法的比较结果。