Coordinating the design of sampling and sparse-dense matrix multiplication (SpMM) is crucial for accelerating graph neural networks (GNNs). However, due to irrational sampling strategies, existing methods face a trade-off between accuracy and speed. Moreover, as computational optimizations progress, data loading has gradually become the primary bottleneck in GNN inference. To address these issues, we propose AES-SpMM, an adaptive edge sampling SpMM kernel. It considers the relationship between the number of non-zero elements in each matrix row and the shared memory width. The edge sampling scheme is adaptively selected according to the different situations of each row. AES-SpMM reduces the graph size through adaptive edge sampling to fit the GPU's shared memory, lowering the computational cost and enhancing data locality, thus balancing the accuracy and speed of GNN inference. Additionally, we introduce a quantization-based AES-SpMM, which applies quantization and dequantization to feature data in GNNs. This approach significantly reduces data loading time while keeping accuracy loss negligible. We evaluated AES-SpMM with common GNN models and datasets. The results show that AES-SpMM outperforms both the cuSPARSE SpMM kernel and GE-SpMM by up to 25.87 times and 23.01 times, respectively, with less than 1% accuracy loss. Compared to ES-SpMM, it reduces accuracy loss by 3.4% on average , achieving a 1.31 times speedup. Compared to AES-SpMM, quantization-based AES-SpMM has a maximum accuracy loss of 0.3% and feature data loading time overhead is reduced by 50.91%-70.51%.
翻译:协调采样与稀疏-稠密矩阵乘法(SpMM)的设计对于加速图神经网络(GNN)至关重要。然而,由于采样策略不合理,现有方法面临精度与速度之间的权衡。此外,随着计算优化的推进,数据加载已逐渐成为GNN推理的主要瓶颈。为解决这些问题,我们提出AES-SpMM,一种自适应边采样的SpMM内核。该方法考虑矩阵每行非零元素数量与共享内存宽度之间的关系,根据每行的不同情况自适应选择边采样方案。AES-SpMM通过自适应边采样缩减图规模以适应GPU共享内存,降低计算成本并增强数据局部性,从而平衡GNN推理的精度与速度。此外,我们引入基于量化的AES-SpMM,对GNN中的特征数据实施量化与反量化操作。该方法在保持精度损失可忽略的前提下,显著减少了数据加载时间。我们在常用GNN模型与数据集上评估了AES-SpMM。实验结果表明,在精度损失低于1%的条件下,AES-SpMM分别比cuSPARSE SpMM内核和GE-SpMM最高提升25.87倍与23.01倍;相较于ES-SpMM,其平均降低3.4%的精度损失,并实现1.31倍加速。与基础AES-SpMM相比,基于量化的AES-SpMM最大精度损失为0.3%,特征数据加载时间开销降低50.91%-70.51%。