The advent of exascale computing invites an assessment of existing best practices for developing application readiness on the world's largest supercomputers. This work details observations from the last four years in preparing scientific applications to run on the Oak Ridge Leadership Computing Facility's (OLCF) Frontier system. This paper addresses a range of topics in software including programmability, tuning, and portability considerations that are key to moving applications from existing systems to future installations. A set of representative workloads provides case studies for general system and software testing. We evaluate the use of early access systems for development across several generations of hardware. Finally, we discuss how best practices were identified and disseminated to the community through a wide range of activities including user-guides and trainings. We conclude with recommendations for ensuring application readiness on future leadership computing systems.
翻译:百亿亿次计算时代的到来,促使我们重新审视在全球最大超级计算机上进行应用准备的现有最佳实践。本文详细介绍了过去四年中,为在橡树岭领导计算设施(OLCF)的Frontier系统上运行科学应用所做准备的观察结果。本文讨论了软件领域的多个主题,包括可编程性、调优和可移植性考量,这些是将应用从现有系统迁移至未来系统的关键。一组代表性工作负载为通用系统和软件测试提供了案例研究。我们评估了跨多代硬件使用早期访问系统进行开发的情况。最后,我们讨论了如何通过用户指南和培训等一系列活动,识别最佳实践并传播给社区。我们总结提出建议,以确保在未来领导级计算系统上的应用就绪性。