Automated MPI-X code generation for scalable finite-difference solvers

Partial differential equations (PDEs) are crucial in modeling diverse phenomena across scientific disciplines, including seismic and medical imaging, computational fluid dynamics, image processing, and neural networks. Solving these PDEs at scale is an intricate and time-intensive process that demands careful tuning. This paper introduces automated code-generation techniques specifically tailored for distributed memory parallelism (DMP) to execute explicit finite-difference (FD) stencils at scale, a fundamental challenge in numerous scientific applications. These techniques are implemented and integrated into the Devito DSL and compiler framework, a well-established solution for automating the generation of FD solvers based on a high-level symbolic math input. Users benefit from modeling simulations for real-world applications at a high-level symbolic abstraction and effortlessly harnessing HPC-ready distributed-memory parallelism without altering their source code. This results in drastic reductions both in execution time and developer effort. A comprehensive performance evaluation of Devito's DMP via MPI demonstrates highly competitive strong and weak scaling on CPU and GPU clusters, proving its effectiveness and capability to meet the demands of large-scale scientific simulations.

翻译：偏微分方程（PDE）在科学领域的多种现象建模中至关重要，涵盖地震与医学成像、计算流体动力学、图像处理及神经网络等领域。大规模求解这些偏微分方程是一个复杂且耗时的过程，需要精细调优。本文针对分布式内存并行（DMP）提出了自动化代码生成技术，专门用于大规模执行显式有限差分（FD）模板——这是众多科学应用中的核心挑战。这些技术已在Devito领域特定语言（DSL）及编译器框架中实现并集成；该框架是基于高层符号数学输入、自动化生成有限差分求解器的成熟解决方案。用户能够在高层符号抽象层面为实际应用建模仿真，并无需修改源代码即可轻松利用面向高性能计算的分布式内存并行能力。这带来了执行时间和开发工作量的显著降低。通过MPI对Devito分布式内存并行进行的全面性能评估表明，其在CPU与GPU集群上均展现出极具竞争力的强扩展与弱扩展性能，证实了其满足大规模科学仿真需求的有效性与能力。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日