In prior work, Gupta et al. (SPAA 2022) presented a distributed algorithm for multiplying sparse $n \times n$ matrices, using $n$ computers. They assumed that the input matrices are uniformly sparse--there are at most $d$ non-zeros in each row and column--and the task is to compute a uniformly sparse part of the product matrix. The sparsity structure is globally known in advance (this is the supported setting). As input, each computer receives one row of each input matrix, and each computer needs to output one row of the product matrix. In each communication round each computer can send and receive one $O(\log n)$-bit message. Their algorithm solves this task in $O(d^{1.907})$ rounds, while the trivial bound is $O(d^2)$. We improve on the prior work in two dimensions: First, we show that we can solve the same task faster, in only $O(d^{1.832})$ rounds. Second, we explore what happens when matrices are not uniformly sparse. We consider the following alternative notions of sparsity: row-sparse matrices (at most $d$ non-zeros per row), column-sparse matrices, matrices with bounded degeneracy (we can recursively delete a row or column with at most $d$ non-zeros), average-sparse matrices (at most $dn$ non-zeros in total), and general matrices.
翻译:在先前的工作中,Gupta等人(SPAA 2022)提出了一种分布式算法,用于在$n$台计算机上计算稀疏$n \times n$矩阵的乘法。他们假设输入矩阵是均匀稀疏的——每行每列最多包含$d$个非零元素——且任务是计算乘积矩阵中均匀稀疏的部分。稀疏结构是预先全局已知的(此为支持设定)。作为输入,每台计算机接收每个输入矩阵的一行,且每台计算机需要输出乘积矩阵的一行。在每轮通信中,每台计算机可以发送和接收一条$O(\log n)$比特的消息。他们的算法在$O(d^{1.907})$轮内解决此任务,而平凡上界为$O(d^2)$。我们在两个维度上改进了先前的工作:首先,我们证明可以更快地解决同一任务,仅需$O(d^{1.832})$轮。其次,我们探索了当矩阵非均匀稀疏时的情况。我们考虑了以下替代的稀疏性概念:行稀疏矩阵(每行最多$d$个非零元)、列稀疏矩阵、有界退化矩阵(可递归删除最多包含$d$个非零元的行或列)、平均稀疏矩阵(总共最多$dn$个非零元)以及一般矩阵。