Load Balancing with Job-Size Testing: Performance Improvement or Degradation?

In the context of decision making under explorable uncertainty, scheduling with testing is a powerful technique used in the management of computer systems to improve performance via better job-dispatching decisions. Upon job arrival, a scheduler may run some \emph{testing algorithm} against the job to extract some information about its structure, e.g., its size, and properly classify it. The acquisition of such knowledge comes with a cost because the testing algorithm delays the dispatching decisions, though this is under control. In this paper, we analyze the impact of such extra cost in a load balancing setting by investigating the following questions: does it really pay off to test jobs? If so, under which conditions? Under mild assumptions connecting the information extracted by the testing algorithm in relationship with its running time, we show that whether scheduling with testing brings a performance degradation or improvement strongly depends on the traffic conditions, system size and the coefficient of variation of job sizes. Thus, the general answer to the above questions is non-trivial and some care should be considered when deploying a testing policy. Our results are achieved by proposing a load balancing model for scheduling with testing that we analyze in two limiting regimes. When the number of servers grows to infinity in proportion to the network demand, we show that job-size testing actually degrades performance unless short jobs can be predicted reliably almost instantaneously \emph{and} the network load is sufficiently high. When the coefficient of variation of job sizes grows to infinity, we construct testing policies inducing an arbitrarily large performance gain with respect to running jobs untested.

翻译：在可探索不确定性下的决策制定背景下，带测试的调度是一种强大技术，广泛应用于计算机系统管理，旨在通过更优的任务分派决策提升性能。当任务到达时，调度器可对任务运行某种*测试算法*，以提取其结构信息（如大小）并进行合理分类。此类知识的获取会带来成本，因为测试算法会延迟分派决策，尽管该延迟处于可控范围。本文通过研究以下问题，分析此类额外成本在负载均衡场景中的影响：测试任务是否真正值得？若值得，需满足何种条件？在连接测试算法所提取信息与其运行时间的温和假设下，我们表明带测试的调度是否带来性能下降或提升，强烈依赖于流量条件、系统规模及任务大小变异系数。因此，上述问题的通用答案具有非平凡性，在部署测试策略时需谨慎考量。我们通过构建一个用于带测试调度的负载均衡模型来取得这些结果，并在两个极限工况下对其进行分析。当服务器数量随网络需求成比例增长至无穷时，我们发现：除非短任务能被几乎瞬时可靠预测*且*网络负载足够高，否则任务大小测试实际上会降低性能。当任务大小变异系数增长至无穷时，我们构建的测试策略可相较未测试任务运行产生任意大的性能增益。