Deep learning research has uncovered the phenomenon of benign overfitting for overparameterized statistical models, which has drawn significant theoretical interest in recent years. Given its simplicity and practicality, the ordinary least squares (OLS) interpolator has become essential to gain foundational insights into this phenomenon. While properties of OLS are well established in classical, underparameterized settings, its behavior in high-dimensional, overparameterized regimes is less explored (unlike for ridge or lasso regression) though significant progress has been made of late. We contribute to this growing literature by providing fundamental algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In particular, we provide algebraic equivalents of (i) the leave-$k$-out residual formula, (ii) Cochran's formula, and (iii) the Frisch-Waugh-Lovell theorem in the overparameterized regime. These results aid in understanding the OLS interpolator's ability to generalize and have substantive implications for causal inference. Under the Gauss-Markov model, we present statistical results such as an extension of the Gauss-Markov theorem and an analysis of variance estimation under homoskedastic errors for the overparameterized regime. To substantiate our theoretical contributions, we conduct simulations that further explore the stochastic properties of the OLS interpolator.
翻译:深度学习研究揭示了过参数化统计模型中的良性过拟合现象,近年来引起了重要的理论关注。鉴于其简洁性与实用性,普通最小二乘(OLS)插值器已成为理解该现象基础原理的关键工具。虽然OLS在经典欠参数化设定下的性质已得到充分确立,但在高维过参数化体系中的行为(与岭回归或LASSO回归不同)尚未得到充分探索,尽管近期已取得显著进展。我们通过为最小$\ell_2$范数OLS插值器提供基础代数与统计结果,为这一不断发展的研究领域作出贡献。具体而言,我们给出了过参数化体系中以下内容的代数等价形式:(i)留$k$法残差公式,(ii)Cochran公式,以及(iii)Frisch-Waugh-Lovell定理。这些结果有助于理解OLS插值器的泛化能力,并对因果推断具有实质意义。在高斯-马尔可夫模型下,我们提出了若干统计结果,包括高斯-马尔可夫定理的扩展以及过参数化体系下同方差误差的方差估计分析。为验证理论贡献,我们通过仿真实验进一步探究了OLS插值器的随机性质。