We evaluate using Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the feasibility, performance, scaling, and trade-offs of (i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive data analysis. Our results suggest that although Julia generates a reasonable LLVM-IR kernel, a nearly 50\% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed implementations. Consequently, Julia emerges as a compelling high-performance and high-productivity workflow composition strategy, as measured on the fastest supercomputer in the world.
翻译:我们评估了以Julia作为由LLVM驱动的单一语言与生态系统范式,用于开发高性能计算工作流程组件。在美国能源部首个百万兆级超算Frontier系统上,我们运行了一个基于内存受限七点模板核的Gray-Scott双变量扩散反应应用。我们评估了以下方面的可行性、性能、可扩展性及权衡:(i)在AMD MI250x GPU上的计算核;(ii)弱扩展至4096个MPI进程/GPU或512个节点;(iii)使用ADIOS2库绑定的并行I/O写入;(iv)用于交互式数据分析的Jupyter Notebooks。结果表明:尽管Julia能生成合理的LLVM-IR核,但在GPU上运行时,与原生的AMD HIP模板代码存在近50%的性能差异。与预期一致,我们观察到在系统级安装的实现中使用MPI和并行I/O绑定时,额外开销几乎为零。因此,Julia作为一种融合高计算效率与高生产力的工作流程构建策略,在世界上最快超算系统上的测试中展现出显著潜力。