We present a simple hierarchical communication scheme for distributed Fast Multipole Methods (FMMs) based on MPI neighborhood collectives and uniform trees. The method targets the common case of extending an existing high-performance shared-memory uniform-tree FMM implementation to distributed memory with minimal redesign while preserving any shared memory optimizations optimizations. Benchmarks on the ARCHER2 supercomputer demonstrate that our method can scale to very large problem sizes, we demonstrate weak-scaling up to 3.2e10 uniformly distributed points on 512 nodes of the machine in our largest runs. Our simplifications based on uniform trees result in worse asymptotic scaling for non-uniform points, however we still obtain practically useful runtimes due to the ability to retain our shared memory optimizations.
翻译:我们提出了一种基于MPI邻域集合与均匀树的分布式快速多极子方法(FMM)的简洁层次化通信方案。该方法针对常见场景:在最小化重新设计的前提下,将现有的高性能共享内存均匀树FMM实现扩展至分布式内存,同时保留所有共享内存优化措施。在ARCHER2超级计算机上的基准测试表明,该方法可扩展至极大规模问题——最大运行中,我们在512个节点上实现了3.2×10¹⁰个均匀分布点的弱可扩展性。基于均匀树的简化策略导致非均匀点的渐近可扩展性变差,但由于能够保留共享内存优化,我们仍能获得实际可用的运行时性能。