We propose uBFT, the first State Machine Replication (SMR) system to achieve microsecond-scale latency in data centers, while using only $2f{+}1$ replicas to tolerate $f$ Byzantine failures. The Byzantine Fault Tolerance (BFT) provided by uBFT is essential as pure crashes appear to be a mere illusion with real-life systems reportedly failing in many unexpected ways. uBFT relies on a small non-tailored trusted computing base -- disaggregated memory -- and consumes a practically bounded amount of memory. uBFT is based on a novel abstraction called Consistent Tail Broadcast, which we use to prevent equivocation while bounding memory. We implement uBFT using RDMA-based disaggregated memory and obtain an end-to-end latency of as little as 10us. This is at least 50$\times$ faster than MinBFT , a state of the art $2f{+}1$ BFT SMR based on Intel's SGX. We use uBFT to replicate two KV-stores (Memcached and Redis), as well as a financial order matching engine (Liquibook). These applications have low latency (up to 20us) and become Byzantine tolerant with as little as 10us more. The price for uBFT is a small amount of reliable disaggregated memory (less than 1 MiB), which in our prototype consists of a small number of memory servers connected through RDMA and replicated for fault tolerance.
翻译:我们提出uBFT,这是首个在数据中心实现微秒级延迟的状态机复制(SMR)系统,仅需$2f{+}1$个副本即可容忍$f$个拜占庭故障。uBFT提供的拜占庭容错能力至关重要,因为实际系统常以多种意外方式失效,纯崩溃似乎只是假象。uBFT依赖一个小型非定制可信计算基——分解内存,并消耗实际有界的内存容量。uBFT基于一种称为一致性尾部广播(Consistent Tail Broadcast)的新型抽象,该抽象在限制内存的同时防止欺骗行为。我们使用基于RDMA的分解内存实现uBFT,端到端延迟低至10微秒,比基于Intel SGX的先进$2f{+}1$拜占庭容错SMR系统MinBFT快至少50倍。我们利用uBFT复制了两个键值存储(Memcached和Redis)以及一个金融订单撮合引擎(Liquibook)。这些应用的延迟较低(最高20微秒),并仅需额外增加10微秒即可获得拜占庭容错能力。uBFT的代价是少量可靠的分解内存(小于1 MiB),在我们的原型中通过RDMA连接的少量内存服务器实现,并通过复制来保证容错。