Least absolute shrinkage and selection operator or Lasso, introduced by Tibshirani (1996), is one of the widely used regularization methods in regression. It is observed that the properties of Lasso vary wildly depending on the choice of the penalty parameter. The recent results of Lahiri (2021) suggest that, depending on the nature of the penalty parameter, Lasso can either be variable selection consistent or be $n^{1/2}-$consistent. However, practitioners generally implement Lasso by choosing the penalty parameter in a data-dependent way, the most popular being the $K$-fold cross-validation. In this paper, we explore the variable selection consistency and $n^{1/2}-$consistency of Lasso when the penalty is chosen based on $K$-fold cross-validation with $K$ being fixed. We consider the fixed-dimensional heteroscedastic linear regression model and show that Lasso with $K$-fold cross-validation based penalty is $n^{1/2}-$consistent, but not variable selection consistent. We also establish the $n^{1/2}-$consistency of the $K$-fold cross-validation based penalty as an intermediate result. Additionally, as a consequence of $n^{1/2}-$consistency, we establish the validity of Bootstrap to approximate the distribution of the Lasso estimator based on $K-$fold cross-validation. We validate the Bootstrap approximation in finite samples based on a moderate simulation study. Thus, our results essentially justify the use of $K$-fold cross-validation in practice to draw inferences based on $n^{1/2}-$scaled pivotal quantities in Lasso regression.
翻译:最小绝对收缩与选择算子(Lasso)由Tibshirani(1996)提出,是回归分析中广泛使用的正则化方法之一。研究表明,Lasso的性质随惩罚参数的选择而发生显著变化。Lahiri(2021)的最新结果表明,根据惩罚参数的性质,Lasso可能实现变量选择相合性,也可能达到n^{1/2}相合性。然而,实践者通常通过数据驱动方式选择惩罚参数来实施Lasso,其中最流行的方法是K折交叉验证。本文探讨当惩罚参数基于固定K值的K折交叉验证选择时,Lasso的变量选择相合性与n^{1/2}相合性。我们在固定维数异方差线性回归模型框架下,证明基于K折交叉验证惩罚项的Lasso具有n^{1/2}相合性,但不具备变量选择相合性。作为中间结果,我们还建立了K折交叉验证惩罚项本身的n^{1/2}相合性。此外,基于n^{1/2}相合性的推论,我们验证了Bootstrap方法可用于近似基于K折交叉验证的Lasso估计量分布。通过中等规模的模拟研究,我们在有限样本中验证了Bootstrap近似的有效性。因此,我们的研究结果从本质上证明了在实践中使用K折交叉验证方法,基于n^{1/2}尺度化枢轴量进行Lasso回归推断的合理性。