Scalable Performance Evaluation of Byzantine Fault-Tolerant Systems Using Network Simulation

from arxiv, 10 pages, accepted at the 28th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2023) 24-27, OCT, 2023, Singapore, Singapore

Recent Byzantine fault-tolerant (BFT) state machine replication (SMR) protocols increasingly focus on scalability to meet the requirements of distributed ledger technology (DLT). Validating the performance of scalable BFT protocol implementations requires careful evaluation. Our solution uses network simulations to forecast the performance of BFT protocols while experimentally scaling the environment. Our method seamlessly plug-and-plays existing BFT implementations into the simulation without requiring code modification or re-implementation, which is often time-consuming and error-prone. Furthermore, our approach is also significantly cheaper than experiments with real large-scale cloud deployments. In this paper, we first explain our simulation architecture, which enables scalable performance evaluations of BFT systems through high performance network simulations. We validate the accuracy of these simulations for predicting the performance of BFT systems by comparing simulation results with measurements of real systems deployed on cloud infrastructures. We found that simulation results display a reasonable approximation at a larger system scale, because the network eventually becomes the dominating factor limiting system performance. In the second part of our paper, we use our simulation method to evaluate the performance of PBFT and BFT protocols from the blockchain generation, such as HotStuff and Kauri, in large-scale and realistic wide-area network scenarios, as well as under induced faults.

翻译：近期，拜占庭容错（BFT）状态机复制（SMR）协议日益注重可扩展性，以满足分布式账本技术（DLT）的需求。验证可扩展BFT协议实现的性能需要细致的评估。我们的解决方案利用网络模拟来预测BFT协议的性能，同时实验性地扩展环境规模。该方法能够无缝地将现有BFT实现"即插即用"集成到模拟中，无需修改代码或重新实现，从而避免了传统方法中常见的耗时与易错问题。此外，相较于在真实大规模云部署上进行实验，我们的方法成本显著降低。本文首先阐述了我们的模拟架构，该架构通过高性能网络模拟支持BFT系统的可扩展性能评估。通过将模拟结果与部署在云基础设施上的真实系统测量值进行对比，我们验证了这些模拟预测BFT系统性能的准确性。结果表明，在较大系统规模下，模拟结果展现出合理的近似度，因为网络最终成为限制系统性能的主导因素。在论文的第二部分，我们利用所提出的模拟方法，在大规模且现实的广域网场景中，以及在人为注入故障的条件下，评估了PBFT和源自区块链时代的BFT协议（如HotStuff和Kauri）的性能。