The Lasso regression is a popular regularization method for feature selection in statistics. Prior to computing the Lasso estimator in both linear and generalized linear models, it is common to conduct a preliminary rescaling of the feature matrix to ensure that all the features are standardized. Without this standardization, it is argued, the Lasso estimate will unfortunately depend on the units used to measure the features. We propose a new type of iterative rescaling of the features in the context of generalized linear models. Whilst existing Lasso algorithms perform a single scaling as a preprocessing step, the proposed rescaling is applied iteratively throughout the Lasso computation until convergence. We provide numerical examples, with both real and simulated data, illustrating that the proposed iterative rescaling can significantly improve the statistical performance of the Lasso estimator without incurring any significant additional computational cost.
翻译:Lasso回归是统计学中一种流行的特征选择正则化方法。在计算线性模型和广义线性模型中Lasso估计量之前,通常需要对特征矩阵进行预处理缩放,以确保所有特征均已标准化。有人认为,缺乏这种标准化会导致Lasso估计量不幸地依赖于测量特征的单位。我们提出了一种在广义线性模型背景下对特征进行新型迭代缩放的方法。现有Lasso算法仅将单次缩放作为预处理步骤,而本文提出的缩放方法在Lasso计算过程中迭代应用,直至收敛。我们提供了基于真实和模拟数据的数值示例,表明所提出的迭代缩放可以在不显著增加额外计算成本的情况下,显著提升Lasso估计量的统计性能。