We present a novel training method for deep operator networks (DeepONets), one of the most popular neural network models for operators. DeepONets are constructed by two sub-networks, namely the branch and trunk networks. Typically, the two sub-networks are trained simultaneously, which amounts to solving a complex optimization problem in a high dimensional space. In addition, the nonconvex and nonlinear nature makes training very challenging. To tackle such a challenge, we propose a two-step training method that trains the trunk network first and then sequentially trains the branch network. The core mechanism is motivated by the divide-and-conquer paradigm and is the decomposition of the entire complex training task into two subtasks with reduced complexity. Therein the Gram-Schmidt orthonormalization process is introduced which significantly improves stability and generalization ability. On the theoretical side, we establish a generalization error estimate in terms of the number of training data, the width of DeepONets, and the number of input and output sensors. Numerical examples are presented to demonstrate the effectiveness of the two-step training method, including Darcy flow in heterogeneous porous media.
翻译:我们提出了一种针对深度算子网络(DeepONets)的新型训练方法,该网络是算子类神经网络模型中最流行的架构之一。DeepONets由两个子网络构成,即分支网络与主干网络。传统上,这两个子网络被同步训练,这相当于在高维空间中求解一个复杂的优化问题。此外,其非凸非线性的特性使得训练极具挑战性。为应对这一难题,我们提出了一种两步训练法:先训练主干网络,再依次训练分支网络。该方法的核心机制受分治策略启发,通过将整个复杂训练任务分解为两个复杂度降低的子任务来实现。其中引入的Gram-Schmidt正交归一化过程显著提升了稳定性和泛化能力。在理论层面,我们基于训练数据量、DeepONets宽度以及输入输出传感器数量建立了泛化误差估计。通过数值算例(包括非均质多孔介质中的达西流动)验证了该两步训练方法的有效性。