We study the training dynamics of a shallow neural network with quadratic activation functions and quadratic cost in a teacher-student setup. In line with previous works on the same neural architecture, the optimization is performed following the gradient flow on the population risk, where the average over data points is replaced by the expectation over their distribution, assumed to be Gaussian.We first derive convergence properties for the gradient flow and quantify the overparameterization that is necessary to achieve a strong signal recovery. Then, assuming that the teachers and the students at initialization form independent orthonormal families, we derive a high-dimensional limit for the flow and show that the minimal overparameterization is sufficient for strong recovery. We verify by numerical experiments that these results hold for more general initializations.
翻译:我们研究了在师生框架下具有二次激活函数和二次代价函数的浅层神经网络的训练动态。与先前关于相同神经架构的研究一致,优化过程遵循总体风险上的梯度流,其中数据点的平均值被其分布(假设为高斯分布)上的期望所替代。我们首先推导了梯度流的收敛性质,并量化了实现强信号恢复所必需的过参数化程度。随后,假设教师网络和初始化时的学生网络构成独立的标准正交族,我们推导了该流的高维极限,并证明最小过参数化足以实现强恢复。通过数值实验验证了这些结果对更一般化的初始化也成立。