The relationship between the number of training data points, the number of parameters in a statistical model, and the generalization capabilities of the model has been widely studied. Previous work has shown that double descent can occur in the over-parameterized regime, and believe that the standard bias-variance trade-off holds in the under-parameterized regime. In this paper, we present a simple example that provably exhibits double descent in the under-parameterized regime. For simplicity, we look at the ridge regularized least squares denoising problem with data on a line embedded in high-dimension space. By deriving an asymptotically accurate formula for the generalization error, we observe sample-wise and parameter-wise double descent with the peak in the under-parameterized regime rather than at the interpolation point or in the over-parameterized regime. Further, the peak of the sample-wise double descent curve corresponds to a peak in the curve for the norm of the estimator, and adjusting $\mu$, the strength of the ridge regularization, shifts the location of the peak. We observe that parameter-wise double descent occurs for this model for small $\mu$. For larger values of $\mu$, we observe that the curve for the norm of the estimator has a peak but that this no longer translates to a peak in the generalization error. Moreover, we study the training error for this problem. The considered problem setup allows for studying the interaction between two regularizers. We provide empirical evidence that the model implicitly favors using the ridge regularizer over the input data noise regularizer. Thus, we show that even though both regularizers regularize the same quantity, i.e., the norm of the estimator, they are not equivalent.
翻译:训练数据点数量、统计模型参数数量与模型泛化能力之间的关系已得到广泛研究。已有工作表明,过参数化区域可能出现双下降现象,而欠参数化区域则被认为遵循标准偏差-方差权衡。本文提出一个简单示例,在欠参数化区域严格证明了双下降的存在。为简化分析,我们研究高维空间中嵌入一维数据的岭正则化最小二乘去噪问题。通过推导泛化误差的渐近精确公式,观察到样本维和参数维均出现双下降现象,且峰值位于欠参数化区域而非插值点或过参数化区域。进一步,样本维双下降曲线的峰值对应估计量范数曲线的峰值,调整岭正则化强度$\mu$可移动峰值位置。我们发现该模型在小$\mu$条件下出现参数维双下降,当$\mu$增大时,估计量范数曲线虽存在峰值,但该峰值不再对应泛化误差的峰值。此外,本文还研究了该问题的训练误差。所构建的问题框架允许研究两种正则化项之间的相互作用。实验证据表明,模型隐式偏好使用岭正则化项而非输入数据噪声正则化项。因此,尽管两种正则化项都作用于同一量(即估计量范数),但它们并不等价。