Crash fault tolerant (CFT) consensus algorithms are commonly used in scenarios where system components are trusted, such as enterprise settings. CFT algorithms offer high throughput and low latency, making them an attractive option for centralized operations that require fault tolerance. However, CFT consensus is vulnerable to Byzantine faults, which can be introduced by a single corrupt component. Such faults can break consensus in the system. Byzantine fault tolerant (BFT) consensus algorithms withstand Byzantine faults, but they are not as competitive with CFT algorithms in terms of performance. In this work, we explore a middle ground between BFT and CFT consensus by exploring the role of accountability in CFT protocols. That is, if a CFT protocol node breaks protocol and affects consensus safety, we aim to identify which node was the culprit. Based on Raft, one of the most popular CFT algorithms, we present Raft-Forensics, which provides accountability over Byzantine faults. We theoretically prove that if two honest components fail to reach consensus, the Raft-Forensics auditing algorithm finds the adversarial component that caused the inconsistency. In an empirical evaluation, we demonstrate that Raft-Forensics performs similarly to Raft and significantly better than state-of-the-art BFT algorithms. With 256 byte messages, Raft-Forensics achieves peak throughput 87.8% of vanilla Raft at 46% higher latency, while state-of-the-art BFT protocol Dumbo-NG only achieves 18.9% peak throughput at nearly $6\times$ higher latency.
翻译:摘要:崩溃容错(CFT)共识算法常用于系统组件可信的场景(如企业环境)。CFT算法具备高吞吐量和低延迟特性,使其成为需要容错的集中式操作中的理想选择。然而,CFT共识易受拜占庭故障影响——单个受损组件即可引入此类故障,进而破坏系统的共识机制。拜占庭容错(BFT)共识算法虽能抵御拜占庭故障,但其性能难以与CFT算法匹敌。本研究通过探索CFT协议中的可问责性角色,寻求BFT与CFT共识之间的平衡方案。即当CFT协议节点违反协议并影响共识安全性时,我们致力于定位肇事节点。基于最流行的CFT算法之一Raft,我们提出Raft-Forensics,该方案实现了对拜占庭故障的可问责性。理论证明表明:若两个诚实组件无法达成共识,Raft-Forensics审计算法必能定位造成冲突的恶意组件。实证评估显示,Raft-Forensics性能与Raft相当,且显著优于现有最优BFT算法。在256字节消息场景中,Raft-Forensics在延迟增加46%的条件下达到原始Raft 87.8%的峰值吞吐量,而最优BFT协议Dumbo-NG在延迟增加近6倍的条件下仅能实现18.9%的峰值吞吐量。