Mean field optimal Core Allocation across Malleable jobs

Modern data centers and cloud computing clusters are increasingly running workloads composed of malleable jobs. A malleable job can be parallelized across any number of cores, yet the job typically exhibits diminishing marginal returns for each additional core on which it runs. This can be seen in the concavity of a job's speedup function, which describes the job's processing speed as a function of the number of cores on which it runs. Given the prevalence of malleable jobs, several theoretical works have posed the problem of how to allocate a fixed number of cores across a stream of arriving malleable jobs so as to minimize the mean response time across jobs. We refer to this as the Core Allocation to Malleable jobs (CAM) problem. We solve the CAM problem under a highly general setting, allowing for multiple job classes, each with an arbitrary concave speedup function and holding costs (weight). Furthermore, we allow for generally distributed inter-arrival times and job sizes. We analyze the CAM problem in the mean field asymptotic regime and derive two distinct mean field optimal policies, FW-CAM and WHAM. FW-CAM is interesting because it demonstrates a new intuition: in the mean field regime, job sizes are not relevant in finding an optimal policy. WHAM (Whittle Allocation for Malleable jobs) is interesting because it is asymptotically optimal and also serves as a good heuristic even outside of the asymptotic regime. Notably, none of the policies previously proposed in the literature are mean field optimal when jobs may follow different speedup functions.

翻译：现代数据中心和云计算集群正日益运行由可扩展作业构成的工作负载。可扩展作业可在任意数量的核心上进行并行化处理，但作业在运行的每个新增核心上通常表现出边际收益递减的特性。这体现在作业加速函数的凹性上，该函数描述了作业处理速度随运行核心数量的变化关系。鉴于可扩展作业的普遍性，若干理论研究提出了如何在固定核心数量条件下，对持续到达的可扩展作业流进行核心分配，以最小化作业的平均响应时间。我们将此问题称为可扩展作业核心分配（CAM）问题。我们在高度通用的设定下解决了CAM问题，允许多个作业类别存在，每个类别具有任意的凹加速函数和持有成本（权重）。此外，我们允许一般分布的到达间隔时间与作业规模。我们在均值场渐近体系下分析CAM问题，并推导出两种不同的均值场最优策略：FW-CAM与WHAM。FW-CAM的独特价值在于揭示了一种新见解：在均值场体系中，作业规模对寻找最优策略并不相关。WHAM（可扩展作业的Whittle分配）的重要意义在于其具有渐近最优性，且在渐近体系之外也能作为有效的启发式策略。值得注意的是，当作业可能遵循不同加速函数时，文献中先前提出的所有策略均非均值场最优。