We propose a novel approach to iterated sparse matrix dense matrix multiplication, a fundamental computational kernel in scientific computing and graph neural network training. In cases where matrix sizes exceed the memory of a single compute node, data transfer becomes a bottleneck. An approach based on dense matrix multiplication algorithms leads to suboptimal scalability and fails to exploit the sparsity in the problem. To address these challenges, we propose decomposing the sparse matrix into a small number of highly structured matrices called arrow matrices, which are connected by permutations. Our approach enables communication-avoiding multiplications, achieving a polynomial reduction in communication volume per iteration for matrices corresponding to planar graphs and other minor-excluded families of graphs. Our evaluation demonstrates that our approach outperforms a state-of-the-art method for sparse matrix multiplication on matrices with hundreds of millions of rows, offering near-linear strong and weak scaling.
翻译:我们提出了一种迭代稀疏矩阵与稠密矩阵乘法的新方法,该运算是科学计算和图神经网络训练中的基础计算核。当矩阵大小超过单个计算节点内存时,数据传输成为瓶颈。基于稠密矩阵乘法算法的方法会导致可扩展性次优,并且无法利用问题的稀疏性。为应对这些挑战,我们提出将稀疏矩阵分解为少量称为箭头矩阵的高结构化矩阵,这些矩阵通过置换相互连接。我们的方法能够实现避免通信的乘法运算,对于平面图及其他可排除子图的图族对应的矩阵,每次迭代的通信量可实现多项式级减少。评估结果表明,在含有数亿行矩阵的稀疏矩阵乘法中,本方法优于现有最先进方法,并实现了近线性的强可扩展性和弱可扩展性。