Partial differential equations (PDEs) are crucial in modeling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs at scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to execute explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modeling simulations for real-world applications at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive strong and weak scaling on CPU and GPU clusters, proving its effectiveness and capability to meet the demands of large-scale scientific simulations.
翻译:偏微分方程(PDE)在科学领域的多种现象建模中至关重要,涵盖地震与医学成像、计算流体动力学、图像处理及神经网络等领域。大规模求解这些偏微分方程是一个复杂且耗时的过程,需要精细调优。本文针对分布式内存并行(DMP)提出了自动化代码生成技术,专门用于大规模执行显式有限差分(FD)模板——这是众多科学应用中的核心挑战。这些技术已在Devito领域特定语言(DSL)及编译器框架中实现并集成;该框架是基于高层符号数学输入、自动化生成有限差分求解器的成熟解决方案。用户能够在高层符号抽象层面为实际应用建模仿真,并无需修改源代码即可轻松利用面向高性能计算的分布式内存并行能力。这带来了执行时间和开发工作量的显著降低。通过MPI对Devito分布式内存并行进行的全面性能评估表明,其在CPU与GPU集群上均展现出极具竞争力的强扩展与弱扩展性能,证实了其满足大规模科学仿真需求的有效性与能力。