In recent years, there has been a significant growth in research focusing on minimum $\ell_2$ norm (ridgeless) interpolation least squares estimators. However, the majority of these analyses have been limited to a simple regression error structure, assuming independent and identically distributed errors with zero mean and common variance, independent of the feature vectors. Additionally, the main focus of these theoretical analyses has been on the out-of-sample prediction risk. This paper breaks away from the existing literature by examining the mean squared error of the ridgeless interpolation least squares estimator, allowing for more general assumptions about the regression errors. Specifically, we investigate the potential benefits of overparameterization by characterizing the mean squared error in a finite sample. Our findings reveal that including a large number of unimportant parameters relative to the sample size can effectively reduce the mean squared error of the estimator. Notably, we establish that the estimation difficulties associated with the variance term can be summarized through the trace of the variance-covariance matrix of the regression errors.
翻译:近年来,针对最小$\ell_2$范数(无岭)插值最小二乘估计量的研究显著增长。然而,多数分析局限于简单的回归误差结构,假设误差独立同分布、均值为零且方差齐性,并与特征向量独立。此外,这些理论分析主要关注样本外预测风险。本文突破现有文献框架,在更一般的回归误差假设下,考察无岭插值最小二乘估计量的均方误差。具体而言,我们通过刻画有限样本下的均方误差,探究过参数化的潜在优势。研究结果表明,相对于样本量纳入大量无关参数可有效降低估计量的均方误差。值得关注的是,我们证实方差项对应的估计难度可通过回归误差方差-协方差矩阵的迹进行概括。