In this paper, we study systems where each job or request can be split into a flexible number of sub-jobs up to a maximum limit. The number of sub-jobs a job is split into depends on the number of available servers found upon its arrival. All sub-jobs of a job are then processed in parallel at different servers leading to a linear speed-up of the job. We refer to such jobs as {\em adaptive multi-server jobs}. We study the problem of optimal assignment of such jobs when each server can process at most one sub-job at any given instant and there is no waiting room in the system. We assume that, upon arrival, a job can only access a randomly sampled subset of $k(n)$ servers from a total of $n$ servers, and the number of sub-jobs is determined based on the number of idle servers within the sampled subset. We analyze the steady-state performance of the system when system load varies according to $\lambda(n) =1 - \beta n^{-\alpha}$ for $\alpha \in [0,1)$, and $\beta \geq 0$. Our interest is to find how large the subset $k(n)$ should be in order to have zero blocking and maximum speed-up in the limit as $n \to \infty$. We first characterize the system's performance when the jobs have access to the full system, i.e., $k(n)=n$. In this setting, we show that the blocking probability approaches to zero at the rate $O(1/\sqrt{n})$ and the mean response time of accepted jobs approaches to its minimum achievable value at rate $O(1/n)$. We then consider the case where the jobs only have access to subset of servers, i.e., $k(n) < n$. We show that as long as $k(n)=\omega(n^\alpha)$, the same asymptotic performance can be achieved as in the case with full system access. In particular, for $k(n)=\Theta(n^\alpha \log n)$, we show that both the blocking probability and the mean response time approach to their desired limits at rate $O(n^{-(1-\alpha)/2})$.
翻译:本文研究每项作业或请求可被拆分为最多不超过上限的灵活数量子作业的系统。作业被拆分成子作业的数量取决于其到达时发现的可用服务器数量。作业的所有子作业随后在不同服务器上并行处理,从而实现作业的线性加速。我们将此类作业称为{\em自适应多服务器作业}。当每台服务器在任何时刻最多只能处理一个子作业且系统无等待空间时,我们研究此类作业的最优分配问题。我们假设作业到达时仅能访问从总共$n$台服务器中随机抽样的$k(n)$台服务器子集,子作业的数量根据抽样子集中空闲服务器的数量确定。我们分析系统负载按$\lambda(n) =1 - \beta n^{-\alpha}$(其中$\alpha \in [0,1)$,$\beta \geq 0$)变化时系统的稳态性能。我们的目标是找出子集$k(n)$应达到多大,才能在$n \to \infty$的极限下实现零阻塞和最大加速。首先,我们刻画作业可访问全部系统(即$k(n)=n$)时的系统性能。在此设定下,我们证明阻塞概率以$O(1/\sqrt{n})$的速率趋近于零,且已接受作业的平均响应时间以$O(1/n)$的速率趋近其最小可达值。随后我们考虑作业仅能访问服务器子集(即$k(n) < n$)的情况。我们证明只要$k(n)=\omega(n^\alpha)$,即可实现与全系统访问相同的渐近性能。特别地,当$k(n)=\Theta(n^\alpha \log n)$时,我们证明阻塞概率和平均响应时间均以$O(n^{-(1-\alpha)/2})$的速率趋近于其期望极限值。