We consider the problem of job assignment where a master server aims to compute some tasks and is provided a few child servers to compute under a uniform straggling pattern where each server is equally likely to straggle. We distribute tasks to the servers so that the master is able to receive most of the tasks even if a significant number of child servers fail to communicate. We first show that all \textit{balanced} assignment schemes have the same expectation on the number of distinct tasks received and then study the variance. The variance or the second moment is a useful metric to study as there could be a high \textit{variation} in the number of distinct tasks received. We show constructions using a generalization of ``Balanced Incomplete Block Design'' [11,40] minimizes the variance, and constructions based on repetition coding schemes attain the largest variance. Both minimum variance and maximum variance attaining designs have their own use cases depending on whether the master aims for a heavy-tailed or light-tailed distribution on the number of distinct jobs. We further show the equivalence between job and server-based assignment schemes when the number of jobs and child servers are equal.
翻译:本文考虑任务分配问题:主服务器需计算若干任务,并配备多个子服务器在统一拖后模式(各服务器等概率出现拖后)下并行计算。我们将任务分配至各服务器,使得即使大量子服务器通信失败,主服务器仍能接收到大多数任务。首先证明所有"平衡"分配方案在接收到的不同任务数量期望值上具有相同表现,继而研究其方差。方差(即二阶矩)是重要度量指标,因为接收到的不同任务数量可能存在较大离散性。研究表明,采用"平衡不完全区组设计"[11,40]的泛化方案可使方差最小化,而基于重复编码的方案则可使方差最大化。最小化与最大化方差的设计方案各有其适用场景,具体取决于主服务器期望获取的不同任务数量分布是重尾还是轻尾。进一步证明,当任务数量与子服务器数量相等时,基于任务与基于服务器的分配方案具有等价性。