Asymptotically Optimal Coded Distributed Computing via Combinatorial Designs

Coded distributed computing (CDC) introduced by Li \emph{et al.} can greatly reduce the communication load for MapReduce computing systems. In the general cascaded CDC with $K$ workers, $N$ input files and $Q$ Reduce functions, each input file will be mapped by $r$ workers and each Reduce function will be computed by $s$ workers such that coding techniques can be applied to achieve the maximum multicast gain. The main drawback of most existing CDC schemes is that they require the original data to be split into a large number of input files that grows exponentially with $K$, which can significantly increase the coding complexity and degrade system performance. In this paper, we first use a classic combinatorial structure $t$-design, for any integer $t\geq 2$, to develop a low-complexity and asymptotically optimal CDC with $r=s$. The main advantages of our scheme via $t$-design are two-fold: 1) having much smaller $N$ and $Q$ than the existing schemes under the same parameters $K$, $r$ and $s$; and 2) achieving smaller communication loads compared with the state-of-the-art schemes. Remarkably, unlike the previous schemes that realize on large operation fields, our scheme operates on the minimum binary field $\mathbb{F}_2$. Furthermore, we show that our construction method can incorporate the other combinatorial structures that have a similar property to $t$-design. For instance, we use $t$-GDD to obtain another asymptotically optimal CDC scheme over $\mathbb{F}_2$ that has different parameters from $t$-design. Finally, we show that our construction method can also be used to construct CDC schemes with $r\neq s$ that have small file number and Reduce function number.

翻译：Li等人提出的编码分布式计算（CDC）可大幅降低MapReduce计算系统的通信负载。在包含K个工作节点、N个输入文件和Q个Reduce函数的通用级联CDC中，每个输入文件将由r个工作节点映射，每个Reduce函数将由s个工作节点计算，从而可应用编码技术实现最大多播增益。现有CDC方案的主要缺陷在于要求原始数据被分割成随K呈指数增长的庞大输入文件数，这会显著增加编码复杂度并降低系统性能。本文首先利用经典组合结构t-设计（t≥2为任意整数），构建了r=s的低复杂度渐近最优CDC方案。基于t-设计的方案具有两大优势：1）在相同参数K、r和s下，所需的输入文件数N和Reduce函数数Q远小于现有方案；2）与最先进方案相比，可实现更低的通信负载。值得注意的是，与先前需要大操作域的实现方案不同，本文方案可在最小二元域F_2上运行。此外，我们证明该构造方法可兼容其他具有类似t-设计特性的组合结构。例如，我们利用t-GDD在F_2上构建了另一类参数异于t-设计的渐近最优CDC方案。最后，我们证明该构造方法还可用于构建r≠s且具有较少文件数与Reduce函数数的CDC方案。