Neural operators are aiming at approximating operators mapping between Banach spaces of functions, achieving much success in the field of scientific computing. Compared to certain deep learning-based solvers, such as Physics-Informed Neural Networks (PINNs), Deep Ritz Method (DRM), neural operators can solve a class of Partial Differential Equations (PDEs). Although much work has been done to analyze the approximation and generalization error of neural operators, there is still a lack of analysis on their training error. In this work, we conduct the convergence analysis of gradient descent for the wide shallow neural operators within the framework of Neural Tangent Kernel (NTK). The core idea lies on the fact that over-parameterization and random initialization together ensure that each weight vector remains near its initialization throughout all iterations, yielding the linear convergence of gradient descent. In this work, we demonstrate that under the setting of over-parametrization, gradient descent can find the global minimum regardless of whether it is in continuous time or discrete time.
翻译:神经算子旨在逼近函数巴拿赫空间之间的算子映射,在科学计算领域取得了显著成功。相较于某些基于深度学习的求解器,如物理信息神经网络(PINNs)、深度里茨方法(DRM),神经算子能够求解一类偏微分方程(PDEs)。尽管已有大量工作分析神经算子的逼近误差与泛化误差,其训练误差的分析仍较为缺乏。本工作于神经正切核(NTK)框架下,对宽浅神经算子的梯度下降法进行收敛性分析。核心思想在于:过参数化与随机初始化共同保证了所有权重向量在所有迭代步中均保持接近其初始值,从而实现了梯度下降的线性收敛。本工作证明,在过参数化设定下,无论连续时间或离散时间情形,梯度下降均能找到全局极小值。