Few neural architectures lend themselves to provable learning with gradient based methods. One popular model is the single-index model, in which labels are produced by composing an unknown linear projection with a possibly unknown scalar link function. Learning this model with SGD is relatively well-understood, whereby the so-called information exponent of the link function governs a polynomial sample complexity rate. However, extending this analysis to deeper or more complicated architectures remains challenging. In this work, we consider single index learning in the setting of symmetric neural networks. Under analytic assumptions on the activation and maximum degree assumptions on the link function, we prove that gradient flow recovers the hidden planted direction, represented as a finitely supported vector in the feature space of power sum polynomials. We characterize a notion of information exponent adapted to our setting that controls the efficiency of learning.
翻译:很少有神经架构能够通过基于梯度的学习方法进行可证明的学习。一种流行的模型是单指标模型,其中标签是通过将未知线性投影与可能未知的标量链接函数复合而产生的。使用随机梯度下降(SGD)学习该模型的研究相对完善,其中链接函数所谓的信息指数决定了多项式样本复杂度速率。然而,将这一分析扩展到更深入或更复杂的架构仍具有挑战性。本文研究对称神经网络场景下的单指标学习。在关于激活函数的解析性假设和关于链接函数的最高阶次假设下,我们证明梯度流能够恢复隐藏的植入方向,该方向表示为幂和多项式特征空间中有限支撑的向量。我们刻画了一个适应于本文场景的信息指数概念,该概念控制学习效率。