A Deep Recurrent-Reinforcement Learning Method for Intelligent AutoScaling of Serverless Functions

Function-as-a-Service (FaaS) introduces a lightweight, function-based cloud execution model that finds its relevance in applications like IoT-edge data processing and anomaly detection. While CSP offer a near-infinite function elasticity, these applications often experience fluctuating workloads and stricter performance constraints. A typical CSP strategy is to empirically determine and adjust desired function instances, "autoscaling", based on monitoring-based thresholds such as CPU or memory, to cope with demand and performance. However, threshold configuration either requires expert knowledge, historical data or a complete view of environment, making autoscaling a performance bottleneck lacking an adaptable solution.RL algorithms are proven to be beneficial in analysing complex cloud environments and result in an adaptable policy that maximizes the expected objectives. Most realistic cloud environments usually involve operational interference and have limited visibility, making them partially observable. A general solution to tackle observability in highly dynamic settings is to integrate Recurrent units with model-free RL algorithms and model a decision process as a POMDP. Therefore, in this paper, we investigate a model-free Recurrent RL agent for function autoscaling and compare it against the model-free Proximal Policy Optimisation (PPO) algorithm. We explore the integration of a LSTM network with the state-of-the-art PPO algorithm to find that under our experimental and evaluation settings, recurrent policies were able to capture the environment parameters and show promising results for function autoscaling. We further compare a PPO-based autoscaling agent with commercially used threshold-based function autoscaling and posit that a LSTM-based autoscaling agent is able to improve throughput by 18%, function execution by 13% and account for 8.4% more function instances.

翻译：函数即服务（FaaS）引入了一种轻量级、基于函数的云执行模型，在物联网边缘数据处理和异常检测等应用场景中具有重要意义。尽管云服务提供商（CSP）提供了近乎无限的函数弹性，但这类应用通常面临工作负载波动和更严格的性能约束。CSP的典型策略是根据CPU或内存等基于监控的阈值，凭借经验确定并调整所需函数实例（即“自动扩缩容”），以应对需求和性能变化。然而，阈值配置要么需要专家知识、历史数据，要么需要对环境的全面了解，这使得自动扩缩容成为缺乏自适应解决方案的性能瓶颈。强化学习（RL）算法已被证明在分析复杂云环境方面卓有成效，并能产生最大化预期目标的自适应策略。大多数实际云环境通常涉及操作干扰且可见性有限，因此具有部分可观测性。解决高度动态环境中可观测性问题的一种通用方案是将循环单元与无模型RL算法相结合，并将决策过程建模为部分可观测马尔可夫决策过程（POMDP）。因此，本文研究了一种用于函数自动扩缩容的无模型循环RL智能体，并将其与无模型近端策略优化（PPO）算法进行对比。我们探索了将LSTM网络与最先进的PPO算法集成，发现在我们的实验和评估设置下，循环策略能够捕捉环境参数，并在函数自动扩缩容方面展现出令人鼓舞的结果。我们进一步将基于PPO的自动扩缩容智能体与商业上使用的基于阈值的函数自动扩缩容进行比较，认为基于LSTM的自动扩缩容智能体能够将吞吐量提高18%，函数执行效率提高13%，并额外处理8.4%的函数实例。