Load Balancing with Job-Size Testing: Performance Improvement or Degradation?

In the context of decision making under explorable uncertainty, scheduling with testing is a powerful technique used in the management of computer systems to improve performance via better job-dispatching decisions. Upon job arrival, a scheduler may run some \emph{testing algorithm} against the job to extract some information about its structure, e.g., its size, and properly classify it. The acquisition of such knowledge comes with a cost because the testing algorithm delays the dispatching decisions, though this is under control. In this paper, we analyze the impact of such extra cost in a load balancing setting by investigating the following questions: does it really pay off to test jobs? If so, under which conditions? Under mild assumptions connecting the information extracted by the testing algorithm in relationship with its running time, we show that whether scheduling with testing brings a performance degradation or improvement strongly depends on the traffic conditions, system size and the coefficient of variation of job sizes. Thus, the general answer to the above questions is non-trivial and some care should be considered when deploying a testing policy. Our results are achieved by proposing a load balancing model for scheduling with testing that we analyze in two limiting regimes. When the number of servers grows to infinity in proportion to the network demand, we show that job-size testing actually degrades performance unless short jobs can be predicted reliably almost instantaneously and the network load is sufficiently high. When the coefficient of variation of job sizes grows to infinity, we construct testing policies inducing an arbitrarily large performance gain with respect to running jobs untested.

翻译：在可探测不确定性下的决策制定背景下，带测试的调度是一种用于计算机系统管理的强大技术，旨在通过更优的作业分派决策来提升性能。当作业到达时，调度器可对作业运行某种测试算法，以提取其结构信息（例如规模）并进行合理分类。这种知识的获取会带来成本，因为测试算法会延迟分派决策——尽管这种延迟可控。本文通过探究以下问题来分析此类额外成本在负载均衡场景中的影响：测试作业是否真正值得？若值得，需满足何种条件？在关于测试算法提取信息与运行时间关系的弱假设下，我们证明带测试调度是否导致性能退化或提升，强烈依赖于流量条件、系统规模及作业规模的变异系数。因此，上述问题的普遍答案并非简单明确，部署测试策略时需谨慎考虑。我们的研究成果通过提出一个用于带测试调度的负载均衡模型实现，并在两种极限场景中对其进行分析。当服务器数量随网络需求等比例增长至无穷大时，我们发现除非短作业几乎能瞬时被可靠预测且网络负载足够高，否则作业规模测试实际上会降低性能。当作业规模变异系数增长至无穷大时，我们构建的测试策略相较于不测试作业可产生任意大的性能增益。