We tackle the challenging tasks of monitoring on unstable HPC platforms the performance of CFD applications all along their development. We have designed and implemented a monitoring framework, integrated at the end of a CI-CD pipeline. Measures retrieved during the automatic execution of production simulations are analyzed within a visual analytics interface we developed, providing advanced visualizations and interaction. We have validated this approach by monitoring the CFD code Alya over two years, detecting and resolving issues related to the platform, and highlighting performance improvement.
翻译:我们应对了在不稳定的高性能计算平台上,全程监控计算流体力学(CFD)应用程序开发性能这一挑战性任务。我们设计并实现了一个监控框架,并将其集成于持续集成-持续部署(CI-CD)管线的末端。通过自动执行生产级仿真获取的测量数据,在我们开发的视觉分析界面中进行深入分析,该界面提供了先进的可视化与交互功能。我们通过监控CFD代码Alya长达两年时间验证了该方法,期间成功检测并解决了平台相关问题,同时显著凸显了性能改进效果。