A finite sample analysis of the benign overfitting phenomenon for ridge function estimation

Recent extensive numerical experiments in high scale machine learning have allowed to uncover a quite counterintuitive phase transition, as a function of the ratio between the sample size and the number of parameters in the model. As the number of parameters $p$ approaches the sample size $n$, the generalisation error increases, but surprisingly, it starts decreasing again past the threshold $p=n$. This phenomenon, brought to the theoretical community attention in \cite{belkin2019reconciling}, has been thoroughly investigated lately, more specifically for simpler models than deep neural networks, such as the linear model when the parameter is taken to be the minimum norm solution to the least-squares problem, firstly in the asymptotic regime when $p$ and $n$ tend to infinity, see e.g. \cite{hastie2019surprises}, and recently in the finite dimensional regime and more specifically for linear models \cite{bartlett2020benign}, \cite{tsigler2020benign}, \cite{lecue2022geometrical}. In the present paper, we propose a finite sample analysis of non-linear models of \textit{ridge} type, where we investigate the \textit{overparametrised regime} of the double descent phenomenon for both the \textit{estimation problem} and the \textit{prediction} problem. Our results provide a precise analysis of the distance of the best estimator from the true parameter as well as a generalisation bound which complements recent works of \cite{bartlett2020benign} and \cite{chinot2020benign}. Our analysis is based on tools closely related to the continuous Newton method \cite{neuberger2007continuous} and a refined quantitative analysis of the performance in prediction of the minimum $\ell_2$-norm solution.

翻译：近期大规模机器学习中的大量数值实验揭示了一个相当反直觉的相变现象，该现象随样本量与模型参数数量的比值而变化。当参数数量$p$接近样本量$n$时，泛化误差增大，但令人惊讶的是，在越过阈值$p=n$后，误差再次开始下降。这一现象由\cite{belkin2019reconciling}引起理论界关注，近期得到了深入研究，特别是针对比深度神经网络更简单的模型，例如参数取为最小二乘问题最小范数解时的线性模型：首先在$p$和$n$趋于无穷的渐近情形（见\cite{hastie2019surprises}），最近则在有限维情形下，尤其针对线性模型（见\cite{bartlett2020benign}、\cite{tsigler2020benign}、\cite{lecue2022geometrical}）。在本文中，我们提出了对\textit{岭}型非线性模型的有限样本分析，研究了在\textit{估计问题}和\textit{预测问题}中双下降现象的\textit{过参数化机制}。我们的结果对最优估计量与真实参数之间的距离给出了精确分析，并提供了一个泛化界，补充了\cite{bartlett2020benign}和\cite{chinot2020benign》的最新工作。我们的分析基于与连续牛顿法\cite{neuberger2007continuous}密切相关的工具，以及最小$\ell_2$范数解在预测性能上的精细化定量分析。