In this work, we present how code generation techniques significantly improve the performance of the computational kernels in the HyTeG software framework. This HPC framework combines the performance and memory advantages of matrix-free multigrid solvers with the flexibility of unstructured meshes. The pystencils code generation toolbox is used to replace the original abstract C++ kernels with highly optimized loop nests. The performance of one of those kernels (the matrix-vector multiplication) is thoroughly analyzed using the Execution-Cache-Memory (ECM) performance model. We validate these predictions by measurements on the SuperMUC-NG supercomputer. The experiments show that the performance mostly matches the predictions. In cases where the prediction does not match, we discuss the discrepancies. Additionally, we conduct a node-level scaling study which shows the expected behavior for a memory-bound compute kernel.
翻译:本文阐述了如何通过代码生成技术显著提升HyTeG软件框架中计算核的性能。这一高性能计算框架将无矩阵多重网格求解器的性能与内存优势,与非结构化网格的灵活性相结合。利用pystencils代码生成工具箱,将原有的抽象C++计算核替换为高度优化的循环嵌套。我们采用执行-缓存-内存(ECM)性能模型,对其中一种计算核(矩阵-向量乘法)的性能进行了全面分析。通过在SuperMUC-NG超级计算机上的实测结果验证了这些预测。实验表明,性能预测与实际结果基本吻合。针对预测偏差的情况,我们详细讨论了其中的差异。此外,我们还开展了节点级扩展性研究,其结果显示该计算核符合内存受限型计算核的预期行为特征。