Numerical methods of the ADER family, in particular finite-element ADER-DG and finite-volume ADER-WENO methods, are among the most accurate numerical methods for solving quasilinear hyperbolic PDE systems. The internal structure of ADER-DG and ADER-WENO numerical methods contains a large number of basic linear algebra operations related to matrix multiplications. The main interface of software libraries for matrix multiplications for high-performance computing is BLAS. An effective method for integration the standard functions of the BLAS interface into the implementation of these numerical methods is presented. The calculated matrices are small matrices; and this allows to use effectively JIT technologies. The proposed approach immediately operates on AoS, which allows to efficiently calculate flux, source and non-conservative terms without transposition. The obtained computational costs demonstrated that the effective implementation, based on the use of the JIT functions of the BLAS, outperformed both the implementation based on the general BLAS functions and the vanilla implementations by several orders of magnitude. The complexity of developing an implementation based on the proposed approach does not exceed the complexity of developing a vanilla implementation. Performance analysis using roofline partly explains the observed features of the decreasing of computational costs.
翻译:ADER族数值方法,特别是有限元ADER-DG与有限体积ADER-WENO方法,是求解拟线性双曲偏微分方程系统最精确的数值方法之一。ADER-DG与ADER-WENO数值方法的内部结构包含大量与矩阵乘法相关的基础线性代数运算。面向高性能计算的矩阵乘法软件库主要接口为BLAS。本文提出了一种将BLAS接口标准函数高效集成到这些数值方法实现中的有效途径。所计算的矩阵均为小型矩阵,这使得即时编译(JIT)技术得以有效应用。所提方法直接对数组结构(AoS)进行操作,从而无需转置即可高效计算通量项、源项与非守恒项。计算成本结果表明,基于BLAS的JIT函数实现的高效实现方案,其性能较基于通用BLAS函数的实现及原生实现提升了数个数量级。基于本方法开发实现的复杂度未超过原生实现的开发复杂度。利用屋顶线模型进行的性能分析部分解释了计算成本降低现象的观测特征。