In the ever-shifting landscape of software engineering, we recognize the need for adaptation and evolution to maintain system dependability. As each software iteration potentially introduces new challenges, from unforeseen bugs to performance anomalies, it becomes paramount to understand and address these intricacies to ensure robust system operations during the lifetime. This work proposes employing software diversity to enhance system reliability and performance simultaneously. A cornerstone of our work is the derivation of a reliability metric. This metric encapsulates the reliability and performance of each software version under adverse conditions. Using the calculated reliability score, we implemented a dynamic controller responsible for adjusting the population of each software version. The goal is to maintain a higher replica count for more reliable versions while preserving the diversity of versions as much as possible. This balance is crucial for ensuring not only the reliability but also the performance of the system against a spectrum of potential failures. In addition, we designed and implemented a diversity-aware autoscaling algorithm that maintains the reliability and performance of the system at the same time and at any scale. Our extensive experiments on realistic cloud microservice-based applications show the effectiveness of the proposed approach in this paper in promoting both reliability and performance.
翻译:在不断变化的软件工程领域,我们认识到为了维持系统可靠性,适应与演进是必要的。由于每个软件迭代都可能引入新的挑战——从未预见的缺陷到性能异常——理解并解决这些复杂问题对于确保系统在整个生命周期中的稳健运行至关重要。本研究提出通过运用软件多样性来同时提升系统可靠性与性能。我们工作的一个基石是推导出了一个可靠性度量指标。该指标概括了每个软件版本在不利条件下的可靠性与性能表现。利用计算得到的可靠性分数,我们实现了一个动态控制器,负责调整各软件版本的副本数量。其目标是尽可能保持版本多样性的同时,为更可靠的版本维持更高的副本数量。这种平衡对于确保系统在面对一系列潜在故障时的可靠性与性能至关重要。此外,我们设计并实现了一种多样性感知的自动扩缩算法,该算法能够在任何规模下同时维持系统的可靠性与性能。我们在基于真实云微服务的应用上进行了大量实验,结果表明本文所提方法在提升可靠性与性能方面均具有显著效果。