Asynchronous Load Balancing and Auto-scaling: Mean-Field Limit and Optimal Design

We develop a Markovian framework for load balancing that combines classical algorithms such as Power-of-$d$ with auto-scaling mechanisms that allow the net service capacity to scale up or down in response to the current load on the same timescale as job dynamics. Our framework is inspired by serverless platforms, such as Knative, where servers are software functions that can be flexibly instantiated in milliseconds according to scaling rules defined by the users of the serverless platform. The main question is how to design such scaling rules to minimize user-perceived delay performance while ensuring low energy consumption. For the first time, we investigate this problem when the auto-scaling and load balancing processes operate asynchronously (or proactively), as in Knative. In contrast to the synchronous (or reactive) paradigm, asynchronism brings the advantage that jobs do not necessarily need to wait any time a scale-up decision is taken. In our main result, we find a general condition on the structure of scaling rules able to drive mean-field dynamics to delay and relative energy optimality, i.e., a situation where both the user-perceived delay and the relative energy waste induced by idle servers vanish in the limit where the network demand grows to infinity in proportion to the nominal service capacity. The identified condition suggests to scale up the current net capacity if and only if the mean demand exceeds the rate at which servers become idle and active. Finally, we propose a family of scaling rules that satisfy our optimality condition. Numerical simulations demonstrate that these rules provide better delay performance than existing synchronous auto-scaling schemes while inducing almost the same power consumption.

翻译：我们建立了一个马尔可夫框架，用于结合经典算法（如Power-of-$d$）与自动扩展机制，使净服务容量能够根据当前负载在作业动态的时间尺度上进行伸缩。该框架受无服务器平台（如Knative）启发，这类平台中服务器是以软件函数形式存在的，可根据用户定义的扩展规则在毫秒级灵活实例化。核心问题是如何设计扩展规则，以在最小化用户感知延迟的同时确保低能耗。我们首次研究了当自动扩展与负载均衡过程异步（或主动）运行时的优化问题——正如Knative中的实现。与同步（或被动）范式相比，异步机制的优势在于扩展决策执行时，作业无需等待。主要结论中，我们给出了扩展规则结构的一般性条件，该条件能驱动平均场动态实现延迟与相对能量最优性，即当网络需求与标称服务容量成比例增长至无穷大时，用户感知延迟与空闲服务器引起的相对能量浪费均趋近于零。该条件表明：当且仅当平均需求超过服务器变为空闲与活跃状态的速率时，应扩展当前净容量。最后，我们提出了满足最优性条件的扩展规则族。数值仿真表明，这些规则在几乎相同的功耗下，比现有的同步自动扩展方案具有更优的延迟性能。