As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications -- XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP improvement on up to 4,096 nodes.
翻译:随着我们进入百亿亿次计算时代,在功耗与能量约束下高效利用能源并优化科学计算应用性能变得至关重要且充满挑战。本文提出了一种低开销的自动调优框架,用于在大规模混合MPI/OpenMP科学计算应用中自动优化性能与能耗,并探索应用运行时间与功耗/能耗之间的权衡关系以实现低能耗执行。基于此框架,我们对四个ECP代理应用——XSBench、AMG、SWFFT和SW4lite进行了自动调优。该方法采用结合随机森林代理模型的贝叶斯优化策略,在阿贡国家实验室Theta系统和橡树岭国家实验室Summit系统两个大规模生产平台上,高效搜索包含多达600万种不同配置的参数空间。实验结果表明,该自动调优框架在大规模系统中具有低开销和良好的可扩展性。利用所提框架识别最优配置,我们在最高4096个节点上实现了高达91.59%的性能提升、21.2%的能耗降低以及37.84%的EDP改善。