We present a simple hierarchical communication scheme for distributed Fast Multipole Methods (FMMs) based on MPI neighborhood collectives and uniform trees. The method targets the common case of extending an existing high-performance shared-memory uniform-tree FMM implementation to distributed memory with minimal redesign while preserving any shared memory optimizations. Benchmarks on the ARCHER2 supercomputer demonstrate that our method can scale to very large problem sizes, we demonstrate weak-scaling up to 3.2e10 uniformly distributed points on 512 nodes of the machine in our largest runs. Our simplifications based on uniform trees result in worse asymptotic scaling for non-uniform points, however we still obtain practically useful runtimes due to the ability to retain our shared memory optimizations.
翻译:我们提出了一种基于MPI邻域集合与均匀树的分布式快速多极子方法(FMMs)的简易分层通信方案。该方法旨在以最小化重新设计为代价,将现有的高性能共享内存均匀树FMM实现扩展至分布式内存,同时保留所有共享内存优化。在ARCHER2超级计算机上的基准测试表明,该方法可扩展至极大问题规模:在最大运行中,我们展示了在512个节点上对3.2e10个均匀分布点进行的弱缩放实验。基于均匀树的简化方案会导致非均匀点的渐近缩放性能下降,但由于能够保留共享内存优化,我们仍能获得具有实用价值的运行时间。