We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian setup. We find determining factors inherent to the prior distribution $π$, giving explicit conditions under which optimal generalization necessitates that the training error be (i) near interpolating relative to the noise size (i.e., memorization is necessary), or (ii) close to the noise level (i.e., overfitting is harmful). Remarkably, these phenomena occur when the noise reaches thresholds determined by the Fisher information and the variance parameters of the prior $π$.
翻译:我们在贝叶斯框架下,针对一般先验分布,研究过参数化线性模型中任意估计过程的训练误差与泛化误差之间的联系。我们发现先验分布$π$的内在决定性因素,并给出了明确条件,用以判定最优泛化何时需要:(i)训练误差相对于噪声尺度接近插值(即记忆化是必要的),或(ii)训练误差接近噪声水平(即过拟合是有害的)。值得注意的是,这些现象在噪声达到由先验$π$的费舍尔信息与方差参数所决定的阈值时出现。