Understanding the mechanisms through which neural networks extract statistics from input-label pairs through feature learning is one of the most important unsolved problems in supervised learning. Prior works demonstrated that the gram matrices of the weights (the neural feature matrices, NFM) and the average gradient outer products (AGOP) become correlated during training, in a statement known as the neural feature ansatz (NFA). Through the NFA, the authors introduce mapping with the AGOP as a general mechanism for neural feature learning. However, these works do not provide a theoretical explanation for this correlation or its origins. In this work, we further clarify the nature of this correlation, and explain its emergence. We show that this correlation is equivalent to alignment between the left singular structure of the weight matrices and the newly defined pre-activation tangent features at each layer. We further establish that the alignment is driven by the interaction of weight changes induced by SGD with the pre-activation features, and analyze the resulting dynamics analytically at early times in terms of simple statistics of the inputs and labels. Finally, motivated by the observation that the NFA is driven by this centered correlation, we introduce a simple optimization rule that dramatically increases the NFA correlations at any given layer and improves the quality of features learned.
翻译:理解神经网络如何通过特征学习从输入-标签对中提取统计量,是监督学习中最重要且尚未解决的问题之一。先前的研究表明,权重矩阵的格拉姆矩阵(即神经特征矩阵,NFM)与平均梯度外积(AGOP)在训练过程中会变得相关,这一现象被称为神经特征假设(NFA)。基于NFA,作者引入以AGOP作为映射的方法,将其作为神经特征学习的通用机制。然而,这些工作并未对这种相关性或其起源提供理论解释。在本研究中,我们进一步阐明了这种相关性的本质,并解释了其产生原因。我们证明这种相关性等价于权重矩阵的左奇异结构与每层新定义的前激活切特征之间的对齐。我们进一步证实,这种对齐是由SGD引起的权重变化与前激活特征之间的相互作用所驱动的,并在早期阶段依据输入和标签的简单统计量对相关动力学进行了分析。最后,基于NFA由这种中心化相关性驱动的观察,我们提出了一种简单的优化规则,该规则能显著增强任意给定层的NFA相关性,并提升所学特征的质量。