We present an expanded study of the performance of FLASH when using Linux Kernel Hugepages on Ookami, an HPE Apollo 80 A64FX platform. FLASH is a multi-scale, multi-physics simulation code written principally in modern Fortran and makes use of the PARAMESH library to manage a block-structured adaptive mesh. Our initial study used only the Fujitsu compiler to utilize standard hugepages (hp), but further investigation allowed us to utilize hp for multiple compilers by linking to the Fujitsu library libmpg and transparent hugepages (thp) by enabling it at the node level. By comparing the results of hardware counters and in-code timers, we found that hp and thp do not significantly impact the runtime performance of FLASH. Interestingly, there is a significant reduction in the TLB misses, differences in cache and memory access counters, and strange behavior is observed when using thp.
翻译:我们扩展了在Ookami(一款HPE Apollo 80 A64FX平台)上使用Linux内核大页时FLASH性能的研究。FLASH是一个多尺度、多物理场模拟代码,主要采用现代Fortran编写,并利用PARAMESH库管理块结构自适应网格。初步研究仅使用富士通编译器调用标准大页(hp),但后续通过链接富士通库libmpg实现了多编译器下的hp支持,并通过在节点级启用透明大页(thp)扩展了研究。通过对比硬件计数器和代码内计时器的结果,我们发现hp和thp对FLASH运行时性能无显著影响。有趣的是,TLB缺失次数显著减少,缓存与内存访问计数器存在差异,且使用thp时观察到异常行为。