Least absolute shrinkage and selection operator or Lasso is one of the widely used regularization methods in regression. Statisticians usually implement Lasso in practice by choosing the penalty parameter in a data-dependent way, the most popular being the $K-$fold cross-validation (or $K-$fold CV). However, inferential properties, such as the variable selection consistency and $n^{1/2}-$consistency, of the $K-$fold CV based Lasso estimator and validity of the Bootstrap approximation are still unknown. In this paper, we consider the heteroscedastic linear regression model and show only under some moment type conditions that the Lasso estimator with $K$-fold CV based penalty is $n^{1/2}-$consistent, but not variable selection consistent. Additionally, we establish the validity of Bootstrap in approximating the distribution of the $K-$fold CV based Lasso estimator. Therefore, our results theoretically justify the use of $K-$fold CV based Lasso estimator to perform statistical inference in linear regression. We validate our Bootstrap method for the $K-$fold CV based Lasso estimator in finite samples based on simulations. We also implement our Bootstrap based inference on a real data set.
翻译:最小绝对收缩与选择算子(Lasso)是回归分析中广泛使用的正则化方法之一。在实践中,统计学家通常以数据驱动的方式选择惩罚参数来实施Lasso,其中最流行的是K折交叉验证(K折CV)。然而,基于K折CV的Lasso估计量的推断性质(如变量选择一致性及n^{1/2}一致性)以及Bootstrap近似的有效性仍属未知。本文考虑异方差线性回归模型,证明仅在某些矩型条件下,采用K折CV惩罚参数的Lasso估计量具有n^{1/2}一致性,但不具备变量选择一致性。此外,我们确立了Bootstrap在近似基于K折CV的Lasso估计量分布方面的有效性。因此,我们的结果从理论上证明了在线性回归中使用基于K折CV的Lasso估计量进行统计推断的合理性。我们通过模拟验证了所提出的Bootstrap方法在有限样本下对基于K折CV的Lasso估计量的有效性,并将基于Bootstrap的推断方法应用于实际数据集。