论深度神经网络的神经特征假设 (On the Neural Feature Ansatz for Deep Neural Networks)

Understanding feature learning is an important open question in establishing a mathematical foundation for deep neural networks. The Neural Feature Ansatz (NFA) states that after training, the Gram matrix of the first-layer weights of a deep neural network is proportional to some power $\alpha>0$ of the average gradient outer product (AGOP) of this network with respect to its inputs. Assuming gradient flow dynamics with balanced weight initialization, the NFA was proven to hold throughout training for two-layer linear networks with exponent $\alpha = 1/2$ (Radhakrishnan et al., 2024). We extend this result to networks with $L \geq 2$ layers, showing that the NFA holds with exponent $\alpha = 1/L$, thus demonstrating a depth dependency of the NFA. Furthermore, we prove that for unbalanced initialization, the NFA holds asymptotically through training if weight decay is applied. We also provide counterexamples showing that the NFA does not hold for some network architectures with nonlinear activations, even when these networks fit arbitrarily well the training data. We thoroughly validate our theoretical results through numerical experiments across a variety of optimization algorithms, weight decay rates and initialization schemes.

翻译：理解特征学习是为深度神经网络建立数学基础的一个重要开放性问题。神经特征假设（NFA）指出，训练完成后，深度神经网络第一层权重的格拉姆矩阵，与该网络关于其输入的平均梯度外积（AGOP）的某个幂次 $\alpha>0$ 成正比。在梯度流动力学和平衡权重初始化的假设下，NFA 已被证明在具有指数 $\alpha = 1/2$ 的两层线性网络的整个训练过程中成立（Radhakrishnan 等人，2024）。我们将此结果推广到具有 $L \geq 2$ 层的网络，证明 NFA 以指数 $\alpha = 1/L$ 成立，从而展示了 NFA 的深度依赖性。此外，我们证明，对于不平衡初始化，如果应用权重衰减，NFA 在训练过程中渐近成立。我们还提供了反例，表明对于某些具有非线性激活函数的网络架构，即使这些网络能任意好地拟合训练数据，NFA 也不成立。我们通过在各种优化算法、权重衰减率和初始化方案下的数值实验，全面验证了我们的理论结果。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日