This paper lays out insights and opportunities for implementing higher-precision matrix-matrix multiplication (GEMM) from (in terms of) lower-precision high-performance GEMM. The driving case study approximates double-double precision (FP64x2) GEMM in terms of double precision (FP64) GEMM, leveraging how the BLAS-like Library Instantiation Software (BLIS) framework refactors the Goto Algorithm. With this, it is shown how approximate FP64x2 GEMM accuracy can be cast in terms of ten ``cascading'' FP64 GEMMs. Promising results from preliminary performance and accuracy experiments are reported. The demonstrated techniques open up new research directions for more general cascading of higher-precision computation in terms of lower-precision computation for GEMM-like functionality.
翻译:本文阐述了利用低精度高性能通用矩阵乘法(GEMM)实现高精度矩阵乘法(GEMM)的洞见与机遇。驱动案例研究以双精度(FP64)GEMM近似双倍双精度(FP64×2)GEMM,利用类BLAS库即时软件(BLIS)框架对Goto算法的重构方法。由此展示了如何通过十次“级联”FP64 GEMM实现近似FP64×2 GEMM精度。本文报告了初步性能与精度实验的 promising 结果。所展示的技术为更通用的基于低精度计算级联实现类GEMM高精度计算开辟了新的研究方向。