Resource management for cloud-native microservices has attracted a lot of recent attention. Previous work has shown that machine learning (ML)-driven approaches outperform traditional techniques, such as autoscaling, in terms of both SLA maintenance and resource efficiency. However, ML-driven approaches also face challenges including lengthy data collection processes and limited scalability. We present Ursa, a lightweight resource management system for cloud-native microservices that addresses these challenges. Ursa uses an analytical model that decomposes the end-to-end SLA into per-service SLA, and maps per-service SLA to individual resource allocations per microservice tier. To speed up the exploration process and avoid prolonged SLA violations, Ursa explores each microservice individually, and swiftly stops exploration if latency exceeds its SLA. We evaluate Ursa on a set of representative and end-to-end microservice topologies, including a social network, media service and video processing pipeline, each consisting of multiple classes and priorities of requests with different SLAs, and compare it against two representative ML-driven systems, Sinan and Firm. Compared to these ML-driven approaches, Ursa provides significant advantages: It shortens the data collection process by more than 128x, and its control plane is 43x faster than ML-driven approaches. At the same time, Ursa does not sacrifice resource efficiency or SLAs. During online deployment, Ursa reduces the SLA violation rate by 9.0% up to 49.9%, and reduces CPU allocation by up to 86.2% compared to ML-driven approaches.
翻译:云原生微服务的资源管理近期受到广泛关注。已有研究表明,机器学习驱动的方法在SLA维护和资源效率方面均优于传统技术(如自动伸缩)。然而,这类方法也面临数据采集周期长、可扩展性有限等挑战。本文提出Ursa——一个轻量级云原生微服务资源管理系统,通过分析模型将端到端SLA分解为各服务级SLA,并将服务级SLA映射至每个微服务层的独立资源配置。为加速探索过程并避免长期SLA违规,Ursa对每个微服务进行独立探索,当延迟超出其SLA时立即停止探索。我们在多种代表性端到端微服务拓扑(包括社交网络、媒体服务和视频处理流水线,每个拓扑包含多个类别和优先级的请求及相应SLA)上评估Ursa,并与两个代表性ML驱动系统(Sinan和Firm)进行对比。相较于这些ML驱动方法,Ursa展现出显著优势:数据收集过程缩短128倍以上,控制平面速度提升43倍。同时,Ursa未牺牲资源效率或SLA。在线部署期间,与ML驱动方法相比,Ursa将SLA违规率降低9.0%至49.9%,CPU分配减少高达86.2%。