Fourier neural operators (FNOs) are a recently introduced neural network architecture for learning solution operators of partial differential equations (PDEs), which have been shown to perform significantly better than comparable deep learning approaches. Once trained, FNOs can achieve speed-ups of multiple orders of magnitude over conventional numerical PDE solvers. However, due to the high dimensionality of their input data and network weights, FNOs have so far only been applied to two-dimensional or small three-dimensional problems. To remove this limited problem-size barrier, we propose a model-parallel version of FNOs based on domain-decomposition of both the input data and network weights. We demonstrate that our model-parallel FNO is able to predict time-varying PDE solutions of over 2.6 billion variables on Perlmutter using up to 512 A100 GPUs and show an example of training a distributed FNO on the Azure cloud for simulating multiphase CO$_2$ dynamics in the Earth's subsurface.
翻译:傅里叶神经算子(FNO)是一种近期提出的神经网络架构,用于学习偏微分方程(PDE)的解算子,实验表明其性能显著优于同类深度学习方法。经过训练后,FNO相较于传统数值PDE求解器可实现多个数量级的加速。然而,由于输入数据和网络权重的高维性,FNO目前仅应用于二维或小型三维问题。为突破这一问题规模限制,我们提出了一种基于输入数据和网络权重区域分解的模型并行FNO版本。实验证明,我们的模型并行FNO能够在Perlmutter上使用多达512块A100 GPU预测超过26亿变量的时变PDE解,并展示了在Azure云端训练分布式FNO以模拟地球次表层多相CO₂动力学的实例。