In this paper, we derive diffusion equation models in the spectral domain for the evolution of training errors of two-layer multi-scale deep neural networks (MscaleDNN) \cite{caixu2019,liu2020multi}, designed to reduce the spectral bias of fully connected deep neural networks in approximating oscillatory functions. The diffusion models are obtained from the spectral form of the error equation of the MscaleDNN, derived with a neural tangent kernel approach and gradient descent training and a sine activation function, assuming a vanishing learning rate and infinite network width and domain size. The involved diffusion coefficients are shown to have larger supports if more scales are used in the MscaleDNN, and thus, the proposed diffusion equation models in the frequency domain explain the MscaleDNN's spectral bias reduction capability. Numerical results of the diffusion models for a two-layer MscaleDNN training match with the error evolution of actual gradient descent training with a reasonably large network width, thus validating the effectiveness of the diffusion models. Meanwhile, the numerical results for MscaleDNN show error decay over a wide frequency range and confirm the advantage of using the MscaleDNN in approximating functions with a wide range of frequencies.
翻译:本文在谱域中推导了用于两层多尺度深度神经网络(MscaleDNN)\cite{caixu2019,liu2020multi}训练误差演化的扩散方程模型,该模型旨在减少全连接深度神经网络在逼近振荡函数时的谱偏置。这些扩散模型通过MscaleDNN误差方程的谱形式获得,该方程采用神经正切核方法、梯度下降训练及正弦激活函数推导,并假设学习率趋近于零、网络宽度和域尺寸趋于无穷大。研究表明,若MscaleDNN中使用更多尺度,所涉及的扩散系数具有更大的支撑集,因此所提出的频域扩散方程模型解释了MscaleDNN的谱偏置减少能力。针对两层MscaleDNN训练的扩散模型数值结果与实际梯度下降训练(采用合理的大网络宽度)的误差演化相匹配,从而验证了扩散模型的有效性。同时,MscaleDNN的数值结果显示误差在宽频率范围内衰减,并证实了使用MscaleDNN逼近宽频带函数的优势。