Regularized models are often sensitive to the scales of the features in the data and it has therefore become standard practice to normalize (center and scale) the features before fitting the model. But there are many different ways to normalize the features and the choice may have dramatic effects on the resulting model. In spite of this, there has so far been no research on this topic. In this paper, we begin to bridge this knowledge gap by studying normalization in the context of lasso, ridge, and elastic net regression. We focus on normal and binary features and show that the class balances of binary features directly influences the regression coefficients and that this effect depends on the combination of normalization and regularization methods used. We demonstrate that this effect can be mitigated by scaling binary features with their variance in the case of the lasso and standard deviation in the case of ridge regression, but that this comes at the cost of increased variance. For the elastic net, we show that scaling the penalty weights, rather than the features, can achieve the same effect. Finally, we also tackle mixes of binary and normal features as well as interactions and provide some initial results on how to normalize features in these cases.
翻译:正则化模型通常对数据中特征尺度敏感,因此在进行模型拟合前对特征进行归一化(中心化与缩放)已成为标准实践。但特征归一化的方法多种多样,不同选择可能对最终模型产生显著影响。尽管这一问题至关重要,目前尚未有相关研究对此进行探讨。本文以lasso回归、岭回归和弹性网络回归为背景,首次系统研究归一化方法的影响。我们聚焦于正态特征与二值特征,证明二值特征的类别平衡会直接影响回归系数,且这种效应取决于归一化方法与正则化方法的组合使用。研究表明:对于lasso回归,通过方差缩放二值特征可缓解该效应;对于岭回归,则需采用标准差进行缩放,但这会以增加方差为代价。针对弹性网络,我们提出通过缩放惩罚权重而非特征本身,可达到同等效果。最后,本文还探讨了二值特征与正态特征的混合情况以及交互作用,并就这些情形下的特征归一化方法提供了初步研究成果。