We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit the sparsity in the problem. To address these challenges, we propose decomposing the sparse matrix into a small number of highly structured matrices called arrow matrices, which are connected by permutations. Our approach enables communication-avoiding multiplications, achieving a polynomial reduction in communication volume per iteration for matrices corresponding to planar graphs and other minor-excluded families of graphs. Our evaluation demonstrates that our approach outperforms a state-of-the-art method for sparse matrix multiplication on matrices with hundreds of millions of rows, offering near-linear strong and weak scaling.
翻译:我们提出了一种迭代稀疏矩阵与稠密矩阵乘法的创新方法,这是科学计算和图神经网络训练中的基础计算内核。当矩阵规模超过单个计算节点内存时,数据传输成为瓶颈。基于稠密矩阵乘法算法的方法会导致可扩展性欠佳,且无法利用问题的稀疏性。为应对这些挑战,我们提出将稀疏矩阵分解为少量称为箭形矩阵的高度结构化矩阵,这些矩阵通过置换操作相互连接。我们的方法实现了避免通信的乘法运算,对于平面图及其他可排除子图的图族所对应的矩阵,每次迭代的通信量可实现多项式级缩减。实验评估表明,在处理拥有数亿行规模的矩阵时,本方法优于当前最先进的稀疏矩阵乘法方法,并展现出近线性的强扩展性和弱扩展性。