The Sparse GEneral Matrix-Matrix multiplication (SpGEMM) $C = A \times B$ is a fundamental routine extensively used in domains like machine learning or graph analytics. Despite its relevance, the efficient execution of SpGEMM on vector architectures is a relatively unexplored topic. The most recent algorithm to run SpGEMM on these architectures is based on the SParse Accumulator (SPA) approach, and it is relatively efficient for sparse matrices featuring several tens of non-zero coefficients per column as it computes C columns one by one. However, when dealing with matrices containing just a few non-zero coefficients per column, the state-of-the-art algorithm is not able to fully exploit long vector architectures when computing the SpGEMM kernel. To overcome this issue we propose the SPA paRallel with Sorting (SPARS) algorithm, which computes in parallel several C columns among other optimizations, and the HASH algorithm, which uses dynamically sized hash tables to store intermediate output values. To combine the efficiency of SPA for relatively dense matrix blocks with the high performance that SPARS and HASH deliver for very sparse matrix blocks we propose H-SPA(t) and H-HASH(t), which dynamically switch between different algorithms. H-SPA(t) and H-HASH(t) obtain 1.24$\times$ and 1.57$\times$ average speed-ups with respect to SPA respectively, over a set of 40 sparse matrices obtained from the SuiteSparse Matrix Collection. For the 22 most sparse matrices, H-SPA(t) and H-HASH(t) deliver 1.42$\times$ and 1.99$\times$ average speed-ups respectively.
翻译:稀疏通用矩阵乘法(SpGEMM)$C = A \times B$ 是广泛应用于机器学习、图分析等领域的基础运算。尽管其重要性不容忽视,但向量架构上SpGEMM的高效执行仍是一个相对未被充分探索的课题。当前在该架构上运行SpGEMM的最新算法基于稀疏累加器(SPA)方法,该方法逐列计算C矩阵,对于每列包含数十个非零系数的稀疏矩阵效率较高。然而,当处理每列仅含少数非零系数的矩阵时,现有算法无法充分利用长向量架构执行SpGEMM内核的计算优势。为解决此问题,我们提出SPARS(SPA并行排序算法,通过并行计算多个C列等优化手段)与HASH算法(采用动态哈希表存储中间输出值)。为结合SPA对较稠密矩阵块的高效性与SPARS及HASH对极稀疏矩阵块的优异性能,我们进一步提出H-SPA(t)与H-HASH(t)算法,可在不同算法间动态切换。基于SuiteSparse矩阵集合中40个稀疏矩阵的测试表明,相较于SPA,H-SPA(t)与H-HASH(t)平均加速比分别达1.24倍和1.57倍;针对其中22个最稀疏矩阵,二者平均加速比分别提升至1.42倍和1.99倍。