Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the last layer is trainable, known as the random feature model. We study over-parametrization in the context of a student-teacher framework by deriving a set of differential equations for the learning dynamics. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly, and we compute the non-zero asymptotic generalization error. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.
翻译:深度神经网络是广泛使用的预测算法,其性能通常随权重数量增加而提升,从而导致过参数化。我们考虑一种两层神经网络,其中第一层固定不变,仅最后一层可训练,即所谓的随机特征模型。通过推导学习动力学的微分方程组,我们在学生-教师框架下研究过参数化现象。对于任意有限比的隐藏层大小与输入维度,学生无法实现完美泛化,我们计算了非零的渐近泛化误差。仅当学生的隐藏层大小呈指数级大于输入维度时,才有可能接近完美泛化。