We evaluate using Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the feasibility, performance, scaling, and trade-offs of (i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive data analysis. Our results suggest that although Julia generates a reasonable LLVM-IR kernel, a nearly 50% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed implementations. Consequently, Julia emerges as a compelling high-performance and high-productivity workflow composition strategy, as measured on the fastest supercomputer in the world.
翻译:我们评估了使用Julia作为由LLVM驱动的单一语言与生态系统范式,以开发高性能计算的工作流组件。我们在美国能源部首台百亿亿次超级计算机Frontier上,运行了一个采用内存受限的七点模板核的Gray-Scott双变量扩散反应应用。我们评估了以下方面的可行性、性能、扩展性及权衡:(i)在AMD MI250x GPU上的计算核;(ii)弱扩展至4096个MPI进程/GPU或512个节点;(iii)利用ADIOS2库绑定进行并行I/O写入;(iv)使用Jupyter Notebook进行交互式数据分析。我们的结果表明,尽管Julia生成的LLVM-IR核较为合理,但在GPU上运行时,与本机AMD HIP模板代码相比存在近50%的性能差异。不出所料,我们观察到在使用系统范围安装的MPI和并行I/O绑定实现时,开销几乎为零。因此,Julia作为一种兼具高性能与高生产力的工作流组合策略脱颖而出——这一结论基于全球最快超级计算机上的实际评测。