Function-as-a-Service is a cloud computing paradigm offering an event-driven execution model to applications. It features serverless attributes by eliminating resource management responsibilities from developers and offers transparent and on-demand scalability of applications. Typical serverless applications have stringent response time and scalability requirements and therefore rely on deployed services to provide quick and fault-tolerant feedback to clients. However, the FaaS paradigm suffers from cold starts as there is a non-negligible delay associated with on-demand function initialization. This work focuses on reducing the frequency of cold starts on the platform by using Reinforcement Learning. Our approach uses Q-learning and considers metrics such as function CPU utilization, existing function instances, and response failure rate to proactively initialize functions in advance based on the expected demand. The proposed solution was implemented on Kubeless and was evaluated using a normalised real-world function demand trace with matrix multiplication as the workload. The results demonstrate a favourable performance of the RL-based agent when compared to Kubeless' default policy and function keep-alive policy by improving throughput by up to 8.81% and reducing computation load and resource wastage by up to 55% and 37%, respectively, which is a direct outcome of reduced cold starts.
翻译:函数即服务是一种云计算范式,为应用程序提供事件驱动的执行模型。它通过消除开发者的资源管理责任,实现无服务器特性,并提供透明且按需扩展的应用能力。典型的无服务器应用对响应时间和可扩展性有严格要求,因此依赖已部署的服务为客户提供快速且容错的反馈。然而,FaaS范式存在冷启动问题,即按需的函数初始化会产生不可忽视的延迟。本研究聚焦于通过强化学习降低平台上的冷启动频率。我们的方法使用Q学习,并考虑函数CPU利用率、现有函数实例和响应失败率等指标,根据预期需求主动提前初始化函数。该方案在Kubeless上实现,并使用标准化真实世界函数调用轨迹(以矩阵乘法为工作负载)进行评估。结果表明,与Kubeless默认策略和函数保活策略相比,基于强化学习的智能体性能优越,吞吐量提升高达8.81%,计算负载和资源浪费分别降低达55%和37%,这直接归因于冷启动的减少。