Deep neural networks are widely used prediction algorithms whose performance often improves as the number of weights increases, leading to over-parametrization. We consider a two-layered neural network whose first layer is frozen while the last layer is trainable, known as the random feature model. We study over-parametrization in the context of a student-teacher framework by deriving a set of differential equations for the learning dynamics. For any finite ratio of hidden layer size and input dimension, the student cannot generalize perfectly, and we compute the non-zero asymptotic generalization error. Only when the student's hidden layer size is exponentially larger than the input dimension, an approach to perfect generalization is possible.
翻译:深度神经网络是广泛使用的预测算法,其性能常随权值数量增加而提升,这导致过参数化现象。我们考虑一个第一层冻结、最后一层可训练的两层神经网络,即随机特征模型。通过推导学习动力学的微分方程组,我们在学生-教师框架下研究过参数化问题。对于任意有限的隐藏层大小与输入维度比值,学生无法实现完美泛化,我们计算了非零的渐近泛化误差。仅当学生的隐藏层大小呈指数级地大于输入维度时,才可能趋近于完美泛化。