Selective inference methods are developed for group lasso estimators for use with a wide class of distributions and loss functions. The method includes the use of exponential family distributions, as well as quasi-likelihood modeling for overdispersed count data, for example, and allows for categorical or grouped covariates as well as continuous covariates. A randomized group-regularized optimization problem is studied. The added randomization allows us to construct a post-selection likelihood which we show to be adequate for selective inference when conditioning on the event of the selection of the grouped covariates. This likelihood also provides a selective point estimator, accounting for the selection by the group lasso. Confidence regions for the regression parameters in the selected model take the form of Wald-type regions and are shown to have bounded volume. The selective inference method for grouped lasso is illustrated on data from the national health and nutrition examination survey while simulations showcase its behaviour and favorable comparison with other methods.
翻译:针对组套索估计器,我们开发了适用于广泛分布族与损失函数的选择性推断方法。该方法不仅涵盖指数族分布及过离散计数数据的拟似然建模(如超几何分布),还能处理分类/分组协变量与连续协变量。我们研究了随机化组正则化优化问题,通过引入随机化机制可构建后选择似然函数,实证表明该函数在条件于分组协变量选择事件时足以支撑选择性推断。该似然函数还提供了考虑组套索选择效应的选择性点估计量。所选定模型中回归参数的置信区域采用Wald型区域,并证明其具有有界体积。我们在国家健康与营养调查数据上验证了分组套索选择性推断方法的有效性,模拟实验则展示了该方法的表现及相较于其他方法的优越性。