Partial differential equations (PDEs) are crucial in modelling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs on a large scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to solve explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modelling simulations at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. While the contributions of this work are implemented and integrated within the Devito framework, the DMP concepts and the techniques applied are generally applicable to any FD solvers. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive weak and strong scaling on the Archer2 supercomputer, demonstrating the effectiveness of the proposed approach in meeting the demands of large-scale scientific simulations.
翻译:偏微分方程在模拟科学领域中多种现象时至关重要,包括地震与医学成像、计算流体动力学、图像处理及神经网络等。大规模求解这些偏微分方程是一个复杂且耗时的过程,需要精细调优。本文引入了专为分布式内存并行设计自动化代码生成技术,用于在规模上求解显式有限差分模板,这是众多科学应用中的基础挑战。这些技术已实现并集成到Devito领域特定语言与编译器框架中,该框架是基于高阶符号数学输入自动化生成有限差分求解器的成熟解决方案。用户能够以高阶符号抽象级别建模模拟,并毫不费力地利用高性能计算就绪的分布式内存并行性,而无需修改源代码。这大幅减少了执行时间和开发工作量。尽管本文的贡献在Devito框架内实现并集成,但所提出的分布式内存并行概念和技术普遍适用于任何有限差分求解器。通过MPI对Devito分布式内存并行进行的全面性能评估,在Archer2超级计算机上展示了极具竞争力的弱扩展性与强扩展性,证实了所提方法在满足大规模科学模拟需求方面的有效性。