The increasing heterogeneity of high-performance computing (HPC) systems and the transition to exascale architectures require systematic and reproducible performance evaluation across diverse workloads. While continuous integration (CI) ensures functional correctness in software engineering, performance and energy efficiency in HPC are typically evaluated outside CI workflows, motivating continuous benchmarking (CB) as a complementary approach. Integrating benchmarking into CI workflows enables reproducible evaluation, early detection of regressions, and continuous validation throughout the software development lifecycle. We present exaCB, a framework for continuous benchmarking developed in the context of the JUPITER exascale system. exaCB enables application teams to integrate benchmarking into their workflows while supporting large-scale, system-wide studies through reusable CI/CD components, established harnesses, and a shared reporting protocol. The framework supports incremental adoption, allowing benchmarks to be onboarded easily and to evolve from basic runnability to more advanced instrumentation and reproducibility. The approach is demonstrated in JUREAP, the early-access program for JUPITER, where exaCB enabled continuous benchmarking of over 70 applications at varying maturity levels, supporting cross-application analysis, performance tracking, and energy-aware studies. These results illustrate the practicality using exaCB for continuous benchmarking for exascale HPC systems across large, diverse collections of scientific applications.
翻译:高性能计算(HPC)系统日益异构化以及向百亿亿次级架构的转型,要求对多样化工作负载进行系统且可复现的性能评估。虽然持续集成(CI)确保了软件工程中的功能正确性,但HPC中的性能和能效通常在CI工作流之外评估,这促使持续基准测试作为补充方法出现。将基准测试集成到CI工作流中,可在软件开发全生命周期实现可复现性评估、回归问题的早期检测以及持续验证。我们提出exaCB框架——一个为JUPITER百亿亿次级系统开发的持续基准测试框架。exaCB支持应用团队将基准测试集成到其工作流中,同时通过可复用的CI/CD组件、既定测试框架和共享报告协议,支撑大规模、系统级的研究。该框架支持增量式采用,使基准测试易于接入,并能从基础可运行性逐步演进至更高级的仪器化和可复现性。该方法已在JUPITER早期接入计划JUREAP中得到验证——exaCB支持了70余个不同成熟度应用的持续基准测试,实现了跨应用分析、性能跟踪和能效相关研究。这些结果验证了exaCB在百亿亿次级HPC系统上,针对大规模、多样化科学应用集合实施持续基准测试的实用性。