Achieving resource efficiency while preserving end-user experience is non-trivial for cloud application operators. As cloud applications progressively adopt microservices, resource managers are faced with two distinct levels of system behavior: the end-to-end application latency and per-service resource usage. Translation between these two levels, however, is challenging because user requests traverse heterogeneous services that collectively (but unevenly) contribute to the end-to-end latency. This paper presents Autothrottle, a bi-level learning-assisted resource management framework for SLO-targeted microservices. It architecturally decouples mechanisms of application SLO feedback and service resource control, and bridges them with the notion of performance targets. This decoupling enables targeted control policies for these two mechanisms, where we combine lightweight heuristics and learning techniques. We evaluate Autothrottle on three microservice applications, with workload traces from production scenarios. Results show its superior CPU resource saving, up to 26.21% over the best-performing baseline, and up to 93.84% over all baselines.
翻译:在保障终端用户体验的同时实现资源效率对云应用运维人员而言并非易事。随着云应用逐步采用微服务架构,资源管理器面临两个不同层级的系统行为:端到端应用延迟与每服务资源使用量。然而,由于用户请求会穿越异构服务链且各服务对端到端延迟的贡献呈非均匀分布,这两个层级之间的转换颇具挑战性。本文提出Autothrottle——一种面向SLO目标微服务的双层学习辅助资源管理框架。该框架通过架构解耦应用程序SLO反馈机制与服务资源控制机制,并以性能目标概念作为桥梁实现两者衔接。这种解耦设计使得针对这两种机制能够分别制定精准的控制策略,其中我们融合了轻量级启发式算法与学习技术。我们基于三个微服务应用及生产环境的负载轨迹对Autothrottle进行评估。实验结果表明,其CPU资源节省效果显著,相较最优基线提升达26.21%,相对于所有基线系统最高可提升93.84%。