Asynchronous Load Balancing and Auto-scaling: Mean-Field Limit and Optimal Design

We develop a Markovian framework for load balancing where classical algorithms such as Power-of-$d$ are combined with auto-scaling mechanisms, which allow the net service capacity to scale up or down in response to the current load within the same timescale of job dynamics. Our framework is inspired by serverless platforms such as Knative where servers are software functions that can be flexibly instantiated in milliseconds according to scaling rules defined by the users of the serverless platform. The main question is how to design such scaling rules to minimize user-perceived delay performance while guaranteeing low energy consumption. For the first time, we investigate this problem when the auto-scaling and load balancing processes operate \emph{asynchronously}, as in Knative. One advantage induced by asynchronism is that jobs do not necessarily need to wait any time a scale-up decision is taken. In our main result, we find a general condition on the structure of scaling rules able to drive mean-field dynamics to delay and relative energy optimality, i.e., a situation where both the user-perceived delay and the relative energy wastage induced by idle servers vanish in the limit where the network demand grows to infinity in proportion to the nominal service capacity. The identified condition suggests to scale up the current net capacity if and only if the mean demand exceeds the rate at which servers become idle and active. Finally, we propose \emph{Rate-Idle}, i.e., a scaling rule that satisfies our optimality condition, and by means of numerical simulations, we show that it improves delay performance over existing (synchronous) schemes.

翻译：我们建立了一个用于负载均衡的马尔可夫框架，其中经典算法（如Power-of-$d$）与自动扩缩容机制相结合，使得净服务容量能够根据当前负载在作业动态的相同时间尺度上进行弹性扩缩。该框架受Knative等无服务器平台的启发：该类平台中服务器为软件函数，可根据用户定义的扩缩规则在毫秒级灵活实例化。核心问题在于如何设计此类扩缩规则，以最小化用户感知的延迟性能同时保证低能耗。我们首次研究了自动扩缩容与负载均衡过程以Knative中的异步方式运行时该问题的解法。异步带来的优势之一是扩缩决策执行时，作业无需强制等待。在主要结论中，我们发现了驱动平均场动力学达到延迟与相对能量最优化的扩缩规则结构通解，即当网络需求与标称服务容量成比例增长至无穷大极限时，用户感知延迟与空闲服务器引发的相对能源浪费均趋近于零的状态。该最优条件表明：当且仅当平均需求超过服务器空闲与激活速率的差异时，应提升当前净容量。最后，我们提出满足最优性条件的"速率-空闲"扩缩规则，并通过数值仿真证明其相比现有（同步）方案可改善延迟性能。