分布式计算中通用且渐近最优的数据与任务分配 (Universal and Asymptotically Optimal Data and Task Allocation in Distributed Computing)

We study the joint minimization of communication and computation costs in distributed computing, where a master node coordinates $N$ workers to evaluate a function over a library of $n$ files. Assuming that the function is decomposed into an arbitrary subfunction set $\mathbf{X}$, with each subfunction depending on $d$ input files, renders our distributed computing problem into a $d$-uniform hypergraph edge partitioning problem wherein the edge set (subfunction set), defined by $d$-wise dependencies between vertices (files) must be partitioned across $N$ disjoint groups (workers). The aim is to design a file and subfunction allocation, corresponding to a partition of $\mathbf{X}$, that minimizes the communication cost $π_{\mathbf{X}}$, representing the maximum number of distinct files per server, while also minimizing the computation cost $δ_{\mathbf{X}}$ corresponding to a maximal worker subfunction load. For a broad range of parameters, we propose a deterministic allocation solution, the \emph{Interweaved-Cliques (IC) design}, whose information-theoretic-inspired interweaved clique structure simultaneously achieves order-optimal communication and computation costs, for a large class of decompositions $\mathbf{X}$. This optimality is derived from our achievability and converse bounds, which reveal -- under reasonable assumptions on the density of $\mathbf{X}$ -- that the optimal scaling of the communication cost takes the form $n/N^{1/d}$, revealing that our design achieves the order-optimal \textit{partitioning gain} that scales as $N^{1/d}$, while also achieving an order-optimal computation cost. Interestingly, this order optimality is achieved in a deterministic manner, and very importantly, it is achieved blindly from $\mathbf{X}$, therefore enabling multiple desired functions to be computed without reshuffling files.

翻译：本研究探讨分布式计算中通信与计算成本的联合最小化问题，其中主节点协调$N$个工作者节点对包含$n$个文件的库进行函数求值。假设该函数可分解为任意子函数集$\mathbf{X}$，每个子函数依赖于$d$个输入文件，这使得我们的分布式计算问题转化为$d$-均匀超图边划分问题：由顶点（文件）间的$d$维依赖关系定义的边集（子函数集）必须划分到$N$个互不相交的组（工作者节点）中。目标是设计对应于$\mathbf{X}$划分的文件与子函数分配方案，以最小化通信成本$π_{\mathbf{X}}$（表示每个服务器处理的不同文件的最大数量），同时最小化计算成本$δ_{\mathbf{X}}$（对应工作者节点的最大子函数负载）。针对广泛参数范围，我们提出确定性分配方案——\emph{交织团（IC）设计}，其受信息论启发的交织团结构对一大类分解$\mathbf{X}$同时实现了阶最优的通信与计算成本。该最优性源于我们的可达性界与逆界分析，这些分析表明——在对$\mathbf{X}$密度提出合理假设的前提下——通信成本的最优缩放具有$n/N^{1/d}$的形式，这证明我们的设计实现了按$N^{1/d}$缩放的阶最优\textit{划分增益}，同时达到了阶最优的计算成本。值得注意的是，这种阶最优性是以确定性方式实现的，且最关键的是，该实现完全独立于$\mathbf{X}$的具体结构，从而使得无需重新分配文件即可计算多个目标函数。