Recent works have shown that traditional Neural Network (NN) architectures display a marked frequency bias in the learning process. Namely, the NN first learns the low-frequency features before learning the high-frequency ones. In this study, we rigorously develop a partial differential equation (PDE) that unravels the frequency dynamics of the error for a 2-layer NN in the Neural Tangent Kernel regime. Furthermore, using this insight, we explicitly demonstrate how an appropriate choice of distributions for the initialization weights can eliminate or control the frequency bias. We focus our study on the Fourier Features model, an NN where the first layer has sine and cosine activation functions, with frequencies sampled from a prescribed distribution. In this setup, we experimentally validate our theoretical results and compare the NN dynamics to the solution of the PDE using the finite element method. Finally, we empirically show that the same principle extends to multi-layer NNs.
翻译:近期研究表明,传统神经网络架构在学习过程中表现出显著的频率偏差特性。具体而言,神经网络会先学习低频特征,再学习高频特征。本研究严格推导了在神经正切核机制下双层神经网络误差频率动态的偏微分方程。基于这一理论洞见,我们进一步论证了通过恰当选择权重初始化的分布方式,可以消除或控制频率偏差。本研究聚焦于傅里叶特征模型——该网络的第一层采用正弦和余弦激活函数,其频率采样自预设分布。在此框架下,我们通过实验验证了理论结果,并利用有限元方法将神经网络动态与偏微分方程解进行了对比分析。最后,我们通过实证表明该原理可扩展至多层神经网络结构。