We evaluate Julia as a single language and ecosystem paradigm powered by LLVM to develop workflow components for high-performance computing. We run a Gray-Scott, 2-variable diffusion-reaction application using a memory-bound, 7-point stencil kernel on Frontier, the US Department of Energy's first exascale supercomputer. We evaluate the performance, scaling, and trade-offs of (i) the computational kernel on AMD's MI250x GPUs, (ii) weak scaling up to 4,096 MPI processes/GPUs or 512 nodes, (iii) parallel I/O writes using the ADIOS2 library bindings, and (iv) Jupyter Notebooks for interactive analysis. Results suggest that although Julia generates a reasonable LLVM-IR, a nearly 50% performance difference exists vs. native AMD HIP stencil codes when running on the GPUs. As expected, we observed near-zero overhead when using MPI and parallel I/O bindings for system-wide installed implementations. Consequently, Julia emerges as a compelling high-performance and high-productivity workflow composition language, as measured on the fastest supercomputer in the world.
翻译:我们评估了以LLVM为支撑的Julia作为单一语言与生态系统范式,用于开发高性能计算工作流组件的可行性。在美国能源部首台百亿亿次超级计算机Frontier上,我们运行了一个采用内存受限七点模板核函数的Gray-Scott双变量扩散反应应用。我们评估了以下性能、可扩展性及权衡:(i)AMD MI250x GPU上的计算核函数;(ii)弱扩展至4096个MPI进程/GPU或512个节点;(iii)使用ADIOS2库绑定的并行I/O写入;(iv)基于Jupyter Notebook的交互式分析。结果表明:尽管Julia能生成合理的LLVM-IR,但在GPU上运行时,其与原生AMD HIP模板代码存在近50%的性能差异。正如预期,使用系统级安装的MPI及并行I/O绑定时,我们观察到近乎零开销。据此,Julia在全球最快超算上的评测显示,它正成为一种兼具高性能与高生产力的工作流组合语言。