On the training and generalization of deep operator networks

We present a novel training method for deep operator networks (DeepONets), one of the most popular neural network models for operators. DeepONets are constructed by two sub-networks, namely the branch and trunk networks. Typically, the two sub-networks are trained simultaneously, which amounts to solving a complex optimization problem in a high dimensional space. In addition, the nonconvex and nonlinear nature makes training very challenging. To tackle such a challenge, we propose a two-step training method that trains the trunk network first and then sequentially trains the branch network. The core mechanism is motivated by the divide-and-conquer paradigm and is the decomposition of the entire complex training task into two subtasks with reduced complexity. Therein the Gram-Schmidt orthonormalization process is introduced which significantly improves stability and generalization ability. On the theoretical side, we establish a generalization error estimate in terms of the number of training data, the width of DeepONets, and the number of input and output sensors. Numerical examples are presented to demonstrate the effectiveness of the two-step training method, including Darcy flow in heterogeneous porous media.

翻译：我们提出了一种针对深度算子网络（DeepONets）的新型训练方法，该网络是算子类神经网络模型中最流行的架构之一。DeepONets由两个子网络构成，即分支网络与主干网络。传统上，这两个子网络被同步训练，这相当于在高维空间中求解一个复杂的优化问题。此外，其非凸非线性的特性使得训练极具挑战性。为应对这一难题，我们提出了一种两步训练法：先训练主干网络，再依次训练分支网络。该方法的核心机制受分治策略启发，通过将整个复杂训练任务分解为两个复杂度降低的子任务来实现。其中引入的Gram-Schmidt正交归一化过程显著提升了稳定性和泛化能力。在理论层面，我们基于训练数据量、DeepONets宽度以及输入输出传感器数量建立了泛化误差估计。通过数值算例（包括非均质多孔介质中的达西流动）验证了该两步训练方法的有效性。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

语言视觉预训练语言模型揭密，Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models

专知会员服务

36+阅读 · 2020年5月20日

【AI应用】Facebook-利用神经网络求解高等数学方程, Using neural networks to solve advanced mathematics equations

专知会员服务

34+阅读 · 2020年1月15日