Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as the neural feature ansatz (NFA). Through the NFA, the authors introduce mapping with the AGOP as a general mechanism for neural feature learning. However, these works do not provide a theoretical explanation for this correlation or its origins. In this work, we further clarify the nature of this correlation, and explain its emergence. We show that this correlation is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent features at each layer. We further establish that the alignment is driven by the interaction of weight changes induced by SGD with the pre-activation features, and analyze the resulting dynamics analytically at early times in terms of simple statistics of the inputs and labels. We prove the derivative alignment occurs almost surely in specific high dimensional settings. Finally, we introduce a simple optimization rule motivated by our analysis of the centered correlation which dramatically increases the NFA correlations at any given layer and improves the quality of features learned.
翻译:理解神经网络如何通过特征学习从输入-标签对中提取统计量,是监督学习中最重要且尚未解决的问题之一。先前的研究表明,权重矩阵的格拉姆矩阵(神经特征矩阵,NFM)与平均梯度外积(AGOP)在训练过程中会变得相关,这一现象被称为神经特征假设(NFA)。通过NFA,作者引入以AGOP作为映射机制来阐释神经特征学习的普遍原理。然而,这些研究并未对这种相关性及其起源提供理论解释。在本工作中,我们进一步阐明了这种相关性的本质,并解释了其产生机制。我们证明这种相关性等价于权重矩阵的左奇异结构与新定义的每层预激活切向特征之间的对齐。我们进一步发现,这种对齐是由随机梯度下降(SGD)引起的权重变化与预激活特征之间的相互作用所驱动,并在早期阶段通过输入和标签的简单统计量对相关动力学进行了分析推导。我们在特定高维设定下证明了导数对齐几乎必然发生。最后,基于对中心化相关性的分析,我们提出了一种简单的优化规则,该规则能显著提升任意层中的NFA相关性,并改善所学特征的质量。