Many cloud applications are migrated from the monolithic model to a microservices framework in which hundreds of loosely-coupled microservices run concurrently, with significant benefits in terms of scalability, rapid development, modularity, and isolation. However, dependencies among microservices with uneven execution time may result in longer queues, idle resources, or Quality-of-Service (QoS) violations. In this paper we introduce Reclaimer, a deep reinforcement learning model that adapts to runtime changes in the number and behavior of microservices in order to minimize CPU core allocation while meeting QoS requirements. When evaluated with two benchmark microservice-based applications, Reclaimer reduces the mean CPU core allocation by 38.4% to 74.4% relative to the industry-standard scaling solution, and by 27.5% to 58.1% relative to a current state-of-the art method.
翻译:许多云应用正从单体架构迁移至微服务框架,其中数百个松散耦合的微服务并发运行,在可扩展性、快速开发、模块化及隔离性方面具有显著优势。然而,微服务间执行时间不均衡的依赖关系可能导致队列过长、资源闲置或服务质量违规。本文提出Reclaimer——一种深度强化学习模型,该模型能适应微服务数量与行为的运行时变化,在满足服务质量要求的同时最小化CPU核心分配。经两个基准微服务应用评估,相比行业标准扩缩容方案,Reclaimer将平均CPU核心分配降低38.4%至74.4%;相较当前最先进方法,该指标降低27.5%至58.1%。