Sparse Matrix-Vector Multiplication (SpMV) is the cornerstone in many iterative workloads, including large-scale graph analytics and sparse iterative solvers. Accelerating SpMV on real-world graphs remains challenging due to highly irregular sparsity patterns. In this paper, we propose MERBIT, a GPU SpMV method designed for repeated SpMV on irregular, graph-like sparse matrices, with PageRank as a representative motivating workload. MERBIT combines two key ideas from existing GPU SpMV methods. At the global level, it uses merge-path partitioning to balance work over nonzeros and row boundaries. At the local level, it encodes each merge-path segment using a compact bit-field descriptor. MERBIT improves workload balance and promotes coalesced memory access for both matrix loading and output writes; moreover, three optimization strategies are incorporated to further enhance performance. Experiments on 50 large irregular datasets demonstrate that MERBIT outperforms competitive baselines, including cuSPARSE, Ginkgo, and academic approaches, achieving average speedups of 1.27 and 1.25 over cuSPARSE COO in single and double precision, respectively.
翻译:稀疏矩阵向量乘法(SpMV)是许多迭代工作负载的基石,包括大规模图分析和稀疏迭代求解器。由于高度不规则的稀疏模式,在真实世界图上加速SpMV仍然具有挑战性。本文提出MERBIT,一种面向不规则、图状稀疏矩阵上重复SpMV的GPU方法,并以PageRank作为典型代表性工作负载。MERBIT结合了现有GPU SpMV方法的两个核心思想。在全局层面,它采用合并路径划分来平衡非零元和行边界上的工作负载。在局部层面,它使用紧凑的位域描述符对每个合并路径段进行编码。MERBIT改善了工作负载平衡,并促进了矩阵加载和输出写入的合并内存访问;此外,还引入了三种优化策略以进一步提升性能。在50个大型不规则数据集上的实验表明,MERBIT在单精度和双精度下分别比cuSPARSE COO实现了平均1.27倍和1.25倍的加速,优于包括cuSPARSE、Ginkgo及学术方法在内的竞争性基线方法。