Efficient Training of Deep Neural Operator Networks via Randomized Sampling

Neural operators (NOs) employ deep neural networks to learn mappings between infinite-dimensional function spaces. Deep operator network (DeepONet), a popular NO architecture, has demonstrated success in the real-time prediction of complex dynamics across various scientific and engineering applications. In this work, we introduce a random sampling technique to be adopted during the training of DeepONet, aimed at improving the generalization ability of the model, while significantly reducing the computational time. The proposed approach targets the trunk network of the DeepONet model that outputs the basis functions corresponding to the spatiotemporal locations of the bounded domain on which the physical system is defined. Traditionally, while constructing the loss function, DeepONet training considers a uniform grid of spatiotemporal points at which all the output functions are evaluated for each iteration. This approach leads to a larger batch size, resulting in poor generalization and increased memory demands, due to the limitations of the stochastic gradient descent (SGD) optimizer. The proposed random sampling over the inputs of the trunk net mitigates these challenges, improving generalization and reducing memory requirements during training, resulting in significant computational gains. We validate our hypothesis through three benchmark examples, demonstrating substantial reductions in training time while achieving comparable or lower overall test errors relative to the traditional training approach. Our results indicate that incorporating randomization in the trunk network inputs during training enhances the efficiency and robustness of DeepONet, offering a promising avenue for improving the framework's performance in modeling complex physical systems.

翻译：神经算子（NOs）利用深度神经网络学习无限维函数空间之间的映射关系。深度算子网络（DeepONet）作为一种流行的神经算子架构，已在各类科学与工程应用中成功实现了复杂动力学的实时预测。本文提出一种在DeepONet训练过程中采用的随机采样技术，旨在提升模型的泛化能力，同时显著减少计算时间。该方法针对DeepONet的骨干网络进行优化，该网络负责输出与物理系统定义的有界域时空位置相对应的基函数。传统DeepONet训练在构建损失函数时，会在每次迭代中对所有输出函数在均匀网格化的时空点进行评估。由于随机梯度下降（SGD）优化器的局限性，这种方法会导致批量规模过大，进而引发泛化能力下降和内存需求增加的问题。所提出的骨干网络输入随机采样策略有效缓解了这些挑战，在训练过程中既提升了泛化性能，又降低了内存需求，从而获得显著的计算效益。我们通过三个基准案例验证了该假设，结果表明：相较于传统训练方法，新方法在取得相当或更低总体测试误差的同时，大幅缩短了训练时间。研究结果表明，在训练过程中对骨干网络输入引入随机化机制，能有效提升DeepONet的效率和鲁棒性，为改进该框架在复杂物理系统建模中的性能提供了可行路径。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日