We propose a new sparse matrix format, PackSELL, designed to support diverse data representations and enable efficient sparse matrix-vector multiplication (SpMV) on GPUs. Building on sliced ELLPACK (SELL), PackSELL incorporates delta encoding of column indices and a novel packing scheme that stores each index-delta-value pair in a single word, thereby reducing memory footprint and data movement. This design further enables fine-grained control over the bit allocation between deltas and values, allowing flexible data representations, including non-IEEE formats. Experimental results show that, when configured for half precision (FP16), the PackSELL-based SpMV kernel outperforms the cuSPARSE SELL-based kernel by up to $1.63\times$. Moreover, with configurations using customized formats, PackSELL achieves FP32-level accuracy while exceeding the performance of FP16 cuSPARSE. These benefits extend to sparse linear solvers; for example, a mixed-precision preconditioned conjugate gradient (PCG) solver using PackSELL achieves up to a $2.09\times$ speedup over the standard full-precision PCG.
翻译:我们提出了一种名为PackSELL的新型稀疏矩阵格式,旨在支持多样化的数据表示并实现GPU上高效的稀疏矩阵-向量乘法(SpMV)。基于分片ELLPACK(SELL)格式,PackSELL引入了列索引的增量编码以及一种新颖的打包方案,将每个索引-增量-值三元组存储于单个字中,从而减少内存占用与数据移动。该设计进一步实现了对增量与值之间比特分配的精细粒度控制,支持包括非IEEE格式在内的灵活数据表示。实验结果表明,当配置为半精度(FP16)时,基于PackSELL的SpMV核函数相比cuSPARSE中基于SELL的核函数性能提升高达$1.63\times$。此外,采用自定义格式配置时,PackSELL在达到FP32级精度的同时,其性能超越了FP16 cuSPARSE。这些优势可扩展至稀疏线性求解器:例如,使用PackSELL的混合精度预条件共轭梯度(PCG)求解器相比标准全精度PCG实现了高达$2.09\times$的加速。