We consider a distributed computing system in which a master node coordinates $N$ workers to evaluate a function over $n$ input files, where this function accepts general decomposition. In particular, we focus on the general case where the requested function admits a $d$-uniform decomposition, meaning that it can be decomposed into a set of subfunctions that each depends on a unique $d$-tuple of the $n$ files. Our objective is to design file and task allocations that minimize the worst-case communication from the master to any worker and the worst-case computational load across workers. We first show that the optimal file and task allocation with minimum communication and computation costs admits a natural characterization within combinatorial design theory: it corresponds to a Steiner system $S(t, k, v)$ with $t=d$, $v=n$, and $k \approx \frac{n}{N^{1/d}}$. However, Steiner systems are known to exist only for very restricted parameter regimes. To overcome this limitation, we propose the information-theoretic-inspired \emph{Interweaved Clique (IC) design}, a universal and deterministic allocation framework that relaxes the strict structure of Steiner systems by allowing slight variations in worker file loads. Although slightly suboptimal, the IC design achieves a communication cost within a constant factor $4e$ from our converse, while also maintaining an order-optimal computation cost, thus allowing this work to derive the fundamental scaling laws of this general distributed computing problem for a large range of parameters.
翻译:考虑一个分布式计算系统,其中主节点协调 $N$ 个工作节点对 $n$ 个输入文件评估某个函数,且该函数接受通用分解。特别地,我们聚焦于函数具有 $d$ 均匀分解的通用情况,即该函数可分解为若干子函数,每个子函数依赖于 $n$ 个文件中的唯一 $d$ 元组。我们的目标是设计最优文件和任务分配方案,以最小化主节点到任意工作节点的最坏情况通信量以及所有工作节点的最坏情况计算负载。首先证明,在组合设计理论框架下,具有最小通信和计算成本的最优文件与任务分配具有自然刻画:它对应于斯坦纳系统 $S(t, k, v)$,其中 $t=d$,$v=n$,且 $k \approx \frac{n}{N^{1/d}}$。然而,斯坦纳系统已知仅存在于非常有限的参数范围内。为克服这一限制,我们提出受信息论启发的交织团簇(IC)设计——一种普适且确定性的分配框架,通过允许工作节点文件负载存在微小变化来放松斯坦纳系统的严格结构。虽略欠最优性,但IC设计实现的通信成本在常数因子 $4e$ 范围内达到我们的逆界,同时保持阶最优的计算成本,从而使得本文能够推导该通用分布式计算问题在大范围参数下的基本缩放律。