In recent years, there has been much interest in understanding the generalization behavior of interpolating predictors, which overfit on noisy training data. Whereas standard analyses are concerned with whether a method is consistent or not, recent observations have shown that even inconsistent predictors can generalize well. In this work, we revisit the classic interpolating Nadaraya-Watson (NW) estimator (also known as Shepard's method), and study its generalization capabilities through this modern viewpoint. In particular, by varying a single bandwidth-like hyperparameter, we prove the existence of multiple overfitting behaviors, ranging non-monotonically from catastrophic, through benign, to tempered. Our results highlight how even classical interpolating methods can exhibit intricate generalization behaviors. In addition, for the purpose of tuning the hyperparameter, the results suggest that over-estimating the intrinsic dimension of the data is less harmful than under-estimating it. Numerical experiments complement our theory, demonstrating the same phenomena.
翻译:近年来,理解在带噪声训练数据上过拟合的插值预测器的泛化行为引起了广泛关注。尽管标准分析关注于方法是否具有一致性,但近期研究表明,即使是不一致的预测器也能表现出良好的泛化性能。本文重新审视经典的插值型Nadaraya-Watson(NW)估计器(亦称Shepard方法),并基于这一现代视角研究其泛化能力。具体而言,通过调节单个类带宽超参数,我们证明了存在多种非单调变化的过拟合行为模式,其范围从灾难性过拟合、经良性过拟合、直至温和过拟合。研究结果揭示了即使是经典插值方法也能展现复杂的泛化行为。此外,在超参数调优方面,结果表明高估数据本征维度的危害性低于低估本征维度。数值实验验证了理论分析,并呈现了相同现象。