Partial differential equations (PDEs) are crucial in modelling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs on a large scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to solve explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modelling simulations at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. While the contributions of this work are implemented and integrated within the Devito framework, the DMP concepts and the techniques applied are generally applicable to any FD solvers. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive weak and strong scaling on the Archer2 supercomputer, demonstrating the effectiveness of the proposed approach in meeting the demands of large-scale scientific simulations.
翻译:偏微分方程(PDE)在多个科学领域(包括地震与医学成像、计算流体动力学、图像处理和神经网络)中建模不同现象时至关重要。大规模求解这些PDE是一项复杂且耗时的过程,需要精细调优。本文针对分布式内存并行(DMP)技术,专门引入自动代码生成方法,以高效求解显式有限差分(FD)模板——这是众多科学应用中的基础挑战。这些技术已实现并集成到Devito DSL及编译器框架中,该框架是基于高级符号数学输入自动生成FD解算器的成熟解决方案。用户可通过高级符号抽象对仿真进行建模,并在无需修改源代码的情况下轻松利用高性能计算就绪的分布式内存并行能力,从而大幅缩短执行时间并降低开发工作量。尽管本文的贡献已集成到Devito框架中,但所提出的DMP概念与技术对任何FD解算器均具有普适性。通过MPI对Devito的DMP进行综合性能评估表明,其在Archer2超级计算机上展现出极具竞争力的弱扩展与强扩展性能,验证了所提方法满足大规模科学仿真需求的有效性。