In this paper, due to the important value in practical applications, we consider the coded distributed matrix multiplication problem of computing $AA^\top$ in a distributed computing system with $N$ worker nodes and a master node, where the input matrices $A$ and $A^\top$ are partitioned into $m$-by-$p$ and $p$-by-$m$ blocks of equal-size sub-matrices respectively. For effective straggler mitigation, we propose a novel computation strategy, named \emph{folded polynomial code}, which is obtained by modifying the entangled polynomial codes. Moreover, we characterize a lower bound on the optimal recovery threshold among all linear computation strategies when the underlying field is the real number field, and our folded polynomial codes can achieve this bound in the case of $m=1$. Compared with all known computation strategies for coded distributed matrix multiplication, our folded polynomial codes outperform them in terms of recovery threshold, download cost, and decoding complexity.
翻译:本文针对分布式计算系统中计算 $AA^\top$ 的编码分布式矩阵乘法问题,考虑包含 $N$ 个工作节点与一个主节点的系统架构,其中输入矩阵 $A$ 和 $A^\top$ 分别被划分为 $m$-by-$p$ 和 $p$-by-$m$ 的等大小子矩阵块。为有效缓解落后节点问题,我们提出了一种名为“折叠多项式编码”的新型计算策略,该策略通过对纠缠多项式编码进行改进而得到。此外,我们刻画了在实数域上所有线性计算策略中最优恢复阈值的下界,且当 $m=1$ 时,折叠多项式编码可达到该下界。与现有所有编码分布式矩阵乘法计算策略相比,本文提出的折叠多项式编码在恢复阈值、下载代价与解码复杂度方面均具有更优性能。