In this study, we introduce a domain-decomposition-based distributed training and inference approach for message-passing neural networks (MPNN). Our objective is to address the challenge of scaling edge-based graph neural networks as the number of nodes increases. Through our distributed training approach, coupled with Nystr\"om-approximation sampling techniques, we present a scalable graph neural network, referred to as DS-MPNN (D and S standing for distributed and sampled, respectively), capable of scaling up to $O(10^5)$ nodes. We validate our sampling and distributed training approach on two cases: (a) a Darcy flow dataset and (b) steady RANS simulations of 2-D airfoils, providing comparisons with both single-GPU implementation and node-based graph convolution networks (GCNs). The DS-MPNN model demonstrates comparable accuracy to single-GPU implementation, can accommodate a significantly larger number of nodes compared to the single-GPU variant (S-MPNN), and significantly outperforms the node-based GCN.
翻译:本研究提出了一种基于区域分解的分布式训练与推理方法,用于消息传递神经网络(MPNN)。我们的目标是解决基于边的图神经网络在节点数量增加时的扩展性挑战。通过结合Nyström近似采样技术的分布式训练方法,我们提出了一种可扩展的图神经网络,称为DS-MPNN(其中D和S分别代表分布式与采样),能够扩展至$O(10^5)$量级的节点规模。我们在两个案例上验证了所提出的采样与分布式训练方法:(a)达西流数据集;(b)二维翼型的稳态RANS模拟,并与单GPU实现及基于节点的图卷积网络(GCN)进行了对比。DS-MPNN模型表现出与单GPU实现相当的精度,能够容纳比单GPU变体(S-MPNN)显著更多的节点,并且显著优于基于节点的GCN。