Modern computing systems process jobs with resource requirements such as CPU and memory, which are described by multiresource jobs (MRJ) queueing models. In practice, job resource requirements are spread out over so many values, that it is rare to see the same value twice. This pattern is best modeled by a continuous distribution of requirement values. However, the existing theoretical work on stability or throughput-optimality focuses on queueing models with class-based resource requirements. In class-based models, the number of distinct resource requirements must be small to demonstrate strong empirical performance, making them a poor match for these practical systems. We introduce the first throughput-optimal family of scheduling policies for the continuous MRJ model, with both preemptive and nonpreemptive variants. We further introduce several efficient policy families, which remain throughput-optimal while considerably improving computational efficiency, under some distributional assumptions. We use a discretization approach, where we choose the discretization granularity based on the system load and the distribution of resource requirements. We validate the real-world applicability of our policies by comparing them against existing index-based policies on parametrized distributions and on datacenter trace data from the Google Borg scheduler, demonstrating state-of-the-art performance.
翻译:现代计算系统处理具有CPU和内存等资源需求的作业,这类需求通过多资源作业(MRJ)排队模型描述。实践中,作业的资源需求取值分布广泛,重复出现相同取值的概率极低。这种模式最适合用需求值的连续分布来建模。然而,现有关于稳定性或吞吐最优性的理论工作主要关注基于类别的资源需求排队模型。在基于类别的模型中,为展现强劲的实证性能,不同资源需求的数量必须保持较小,这使得该类模型与实际系统匹配度较差。我们首次针对连续MRJ模型提出具有抢占与非抢占变体的吞吐最优调度策略族。进一步地,我们引入若干高效策略族,在特定分布假设下,这些策略族在保持吞吐最优性的同时显著提升了计算效率。我们采用离散化方法,根据系统负载与资源需求分布选择离散化粒度。通过在参数化分布及Google Borg调度器的数据中心轨迹数据上对比现有基于索引的策略,验证了所提策略在实际应用中的卓越性能,展现了当前最优水平。