We develop novel LASSO-based methods for coefficient testing and confidence interval construction in the Gaussian linear model with $n\ge d$. Our methods' finite-sample guarantees are identical to those of their ubiquitous ordinary-least-squares-$t$-test-based analogues, yet have substantially higher power when the true coefficient vector is sparse. In particular, our coefficient test, which we call the $\ell$-test, performs like the one-sided $t$-test (despite not being given any information about the sign) under sparsity, and the corresponding confidence intervals are more than 10% shorter than the standard $t$-test based intervals. The nature of the $\ell$-test directly provides a novel exact adjustment conditional on LASSO selection for post-selection inference, allowing for the construction of post-selection p-values and confidence intervals. None of our methods require resampling or Monte Carlo estimation. We perform a variety of simulations and a real data analysis on an HIV drug resistance data set to demonstrate the benefits of the $\ell$-test. We end with a discussion of how the $\ell$-test may asymptotically apply to a much more general class of parametric models.
翻译:本文针对$n\ge d$的高斯线性模型,提出了基于LASSO的系数检验与置信区间构建新方法。在有限样本条件下,新方法的理论保证与广泛使用的普通最小二乘$t$检验类方法完全相同,但当真实系数向量具有稀疏性时,新方法具有显著更高的检验功效。特别地,我们提出的系数检验方法(称为$\ell$-检验)在稀疏条件下能够达到单边$t$检验的性能(尽管未获得任何符号信息),其对应的置信区间比标准$t$检验区间缩短超过10%。$\ell$-检验的特性直接为LASSO选择后的条件推断提供了新颖的精确调整方法,使得选择后$p$值与置信区间的构建成为可能。所有方法均无需重采样或蒙特卡洛估计。我们通过多种模拟实验和HIV耐药数据集的实证分析,验证了$\ell$-检验的优越性。最后讨论了$\ell$-检验如何渐近地适用于更广泛的参数模型类别。