Developing efficient parallel applications is critical to advancing scientific development but requires significant performance analysis and optimization. Performance analysis tools help developers manage the increasing complexity and scale of performance data, but often rely on the user to manually explore low-level data and are rigid in how the data can be manipulated. We propose a Python-based API, Chopper, which provides high-level and flexible performance analysis for both single and multiple executions of parallel applications. Chopper facilitates performance analysis and reduces developer effort by providing configurable high-level methods for common performance analysis tasks such as calculating load imbalance, hot paths, scalability bottlenecks, correlation between metrics and CCT nodes, and causes of performance variability within a robust and mature Python environment that provides fluid access to lower-level data manipulations. We demonstrate how Chopper allows developers to quickly and succinctly explore performance and identify issues across applications such as AMG, Laghos, LULESH, Quicksilver and Tortuga.
翻译:开发高效的并行应用对于推动科学发展至关重要,但需要深入的性能分析与优化工作。性能分析工具帮助开发者应对日益复杂和规模庞大的性能数据,然而这些工具往往依赖用户手动探索底层数据,且数据处理方式缺乏灵活性。我们提出基于Python的应用编程接口(API)——Chopper,该工具可为并行应用的单次及多次执行提供高层级、灵活的性能分析功能。Chopper通过提供可配置的高层级方法,支持负载不均衡计算、热点路径分析、可扩展性瓶颈识别、指标与调用上下文树(CCT)节点相关性分析、性能变异原因诊断等常见性能分析任务,在稳健成熟的Python环境中实现底层数据的高效操作,从而降低开发者工作负担。我们通过AMG、Laghos、LULESH、Quicksilver和Tortuga等应用案例验证了Chopper帮助开发者快速、简洁地探索性能特征并定位问题的能力。