Resilience testing, which measures the ability to minimize service degradation caused by unexpected failures, is crucial for microservice systems. The current practice for resilience testing relies on manually defining rules for different microservice systems. Due to the diverse business logic of microservices, there are no one-size-fits-all microservice resilience testing rules. As the quantity and dynamic of microservices and failures largely increase, manual configuration exhibits its scalability and adaptivity issues. To overcome the two issues, we empirically compare the impacts of common failures in the resilient and unresilient deployments of a benchmark microservice system. Our study demonstrates that the resilient deployment can block the propagation of degradation from system performance metrics (e.g., memory usage) to business metrics (e.g., response latency). In this paper, we propose AVERT, the first AdaptiVE Resilience Testing framework for microservice systems. AVERT first injects failures into microservices and collects available monitoring metrics. Then AVERT ranks all the monitoring metrics according to their contributions to the overall service degradation caused by the injected failures. Lastly, AVERT produces a resilience index by how much the degradation in system performance metrics propagates to the degradation in business metrics. The higher the degradation propagation, the lower the resilience of the microservice system. We evaluate AVERT on two open-source benchmark microservice systems. The experimental results show that AVERT can accurately and efficiently test the resilience of microservice systems.
翻译:韧性测试是衡量系统在遭遇意外故障时最小化服务降级能力的关键手段,对于微服务系统至关重要。当前韧性测试实践依赖于为不同微服务系统人工定义规则。由于微服务业务逻辑的多样性,不存在普适性的微服务韧性测试规则。随着微服务数量、动态变化及故障类型的显著增加,人工配置暴露出可扩展性与自适应性不足的问题。为克服这两大难题,我们通过实证研究对比了基准微服务系统在韧性部署与非韧性部署中常见故障的影响。研究表明,韧性部署能有效阻断服务降级从系统性能指标(如内存使用率)向业务指标(如响应延迟)的传播。本文提出AVERT——首个面向微服务系统的自适应韧性测试框架。AVERT首先向微服务注入故障并收集可用监控指标,随后根据各指标对注入故障引发的整体服务降级的贡献程度进行排序,最终通过系统性能指标降级向业务指标降级的传播程度生成韧性指数:降级传播越严重,微服务系统韧性越低。我们在两个开源基准微服务系统上评估AVERT,实验结果表明该方法能够准确高效地测试微服务系统的韧性。