We prove a non-asymptotic distribution-independent lower bound for the expected mean squared generalization error caused by label noise in ridgeless linear regression. Our lower bound generalizes a similar known result to the overparameterized (interpolating) regime. In contrast to most previous works, our analysis applies to a broad class of input distributions with almost surely full-rank feature matrices, which allows us to cover various types of deterministic or random feature maps. Our lower bound is asymptotically sharp and implies that in the presence of label noise, ridgeless linear regression does not perform well around the interpolation threshold for any of these feature maps. We analyze the imposed assumptions in detail and provide a theory for analytic (random) feature maps. Using this theory, we can show that our assumptions are satisfied for input distributions with a (Lebesgue) density and feature maps given by random deep neural networks with analytic activation functions like sigmoid, tanh, softplus or GELU. As further examples, we show that feature maps from random Fourier features and polynomial kernels also satisfy our assumptions. We complement our theory with further experimental and analytic results.
翻译:我们证明了无峰线性回归中由标签噪声引起的期望均方泛化误差的一个非渐近、与分布无关的下界。该下界将已知的类似结论推广至过参数化(插值)区域。与以往大多数研究不同,我们的分析适用于输入分布广泛的一类情况,其特征矩阵几乎必然满秩,从而能够涵盖各类确定性或随机特征映射。该下界渐近紧致,表明在存在标签噪声时,对于这些特征映射,无峰线性回归在插值阈值附近表现不佳。我们详细分析了所施加的假设,并为解析(随机)特征映射提供了理论框架。利用该理论,我们证明了当输入分布具有(勒贝格)密度且特征映射由具有解析激活函数(如sigmoid、tanh、softplus或GELU)的随机深度神经网络给出时,我们的假设成立。此外,我们进一步证明,随机傅里叶特征和多项式核的特征映射也满足我们的假设。我们通过实验与分析结果对理论进行了补充。