In this study, we explore the effects of including noise predictors and noise observations when fitting linear regression models. We present empirical and theoretical results that show that double descent occurs in both cases, albeit with contradictory implications: the implication for noise predictors is that complex models are often better than simple ones, while the implication for noise observations is that simple models are often better than complex ones. We resolve this contradiction by showing that it is not the model complexity but rather the implicit shrinkage by the inclusion of noise in the model that drives the double descent. Specifically, we show how noise predictors or observations shrink the estimators of the regression coefficients and make the test error asymptote, and then how the asymptotes of the test error and the ``condition number anomaly'' ensure that double descent occurs. We also show that including noise observations in the model makes the (usually unbiased) ordinary least squares estimator biased and indicates that the ridge regression estimator may need a negative ridge parameter to avoid over-shrinkage.
翻译:本研究探讨了在线性回归模型拟合过程中纳入噪声预测变量和噪声观测值所产生的影响。我们通过实证与理论分析表明,在这两种情况下均会出现双下降现象,但其影响却相互矛盾:对于噪声预测变量而言,复杂模型往往优于简单模型;而对于噪声观测值,简单模型则通常优于复杂模型。我们通过论证指出,驱动双下降现象的关键因素并非模型复杂度,而是模型引入噪声所产生的隐式收缩效应。具体而言,我们揭示了噪声预测变量或观测值如何收缩回归系数估计量并使测试误差渐近收敛,进而论证了测试误差渐近线与"条件数异常"现象如何共同确保双下降的发生。研究还表明,在模型中纳入噪声观测值会使(通常无偏的)普通最小二乘估计量产生偏差,并暗示岭回归估计量可能需要负的岭参数以避免过度收缩。