In this paper, we propose a test procedure based on the LASSO methodology to test the global null hypothesis of no dependence between a response variable and $p$ predictors, where $n$ observations with $n < p$ are available. The proposed procedure is similar to the F-test for a linear model, which evaluates significance based on the ratio of explained to unexplained variance. However, the F-test is not suitable for models where $p \geq n$. This limitation is due to the fact that when $p \geq n$, the unexplained variance is zero and thus the F-statistic can no longer be calculated. In contrast, the proposed extension of the LASSO methodology overcomes this limitation by using the number of non-zero coefficients in the LASSO model as a test statistic after suitably specifying the regularization parameter. The method allows reliable analysis of high-dimensional datasets with as few as $n = 40$ observations. The performance of the method is tested by means of a power study.
翻译:本文提出了一种基于LASSO方法的检验程序,用于检验响应变量与$p$个预测变量之间无依赖关系的全局零假设,其中可用的观测数据为$n$个且满足$n < p$。该程序类似于线性模型中的F检验,后者基于解释方差与未解释方差之比评估显著性。然而,当$p \geq n$时,F检验不再适用。这一限制源于当$p \geq n$时未解释方差为零,导致无法计算F统计量。相比之下,所提出的LASSO方法扩展通过适当指定正则化参数后,利用LASSO模型中非零系数的数量作为检验统计量,克服了这一限制。该方法能够对低至$n = 40$个观测值的高维数据集进行可靠分析。通过功效研究对该方法的性能进行了测试。