Numerical methods of the ADER family, in particular finite-element ADER-DG and finite-volume ADER-WENO methods, are among the most accurate numerical methods for solving quasilinear PDE systems. The internal structure of ADER-DG and ADER-WENO numerical methods contains a large number of basic linear algebra operations related to matrix multiplications. The main interface of software libraries for matrix multiplications for high-performance computing is BLAS. This paper presents an effective method for integration the standard functions of the BLAS interface into the implementation of these numerical methods. The calculated matrices are small matrices; at the same time, the proposed implementation makes it possible to effectively use existing JIT technologies. The proposed approach immediately operates on AoS, which makes it possible to efficiently calculate flux, source and non-conservative terms without need to carry out transposition. The obtained computational costs demonstrated that the effective implementation, based on the use of the JIT functions of the BLAS, outperformed both the implementation based on the general BLAS functions and the vanilla implementations by several orders of magnitude. At the same time, the complexity of developing an implementation based on the approach proposed in this work does not exceed the complexity of developing a vanilla implementation.
翻译:ADER系列数值方法,特别是有限元ADER-DG与有限体积ADER-WENO方法,是求解拟线性偏微分方程系统最精确的数值方法之一。ADER-DG与ADER-WENO数值方法的内部结构包含大量与矩阵乘法相关的基础线性代数运算。面向高性能计算的矩阵乘法软件库主要接口为BLAS。本文提出一种将BLAS接口标准函数集成到这些数值方法实现中的高效方案。所计算的矩阵均为小型矩阵;同时,所提出的实现方案能够有效利用现有的即时编译技术。该方案直接对数组结构进行操作,从而无需转置即可高效计算通量项、源项与非守恒项。计算成本结果表明,基于BLAS即时编译函数的高效实现,其性能较基于通用BLAS函数的实现及原始实现提升了数个数量级。与此同时,基于本工作所提方案开发实现的复杂度并未超过原始实现的开发复杂度。