Classical state-machine replication protocols, such as Paxos, rely on a distinguished leader process to order commands. Unfortunately, this approach makes the leader a single point of failure and increases the latency for clients that are not co-located with it. As a response to these drawbacks, Egalitarian Paxos introduced an alternative, leaderless approach, that allows replicas to order commands collaboratively. Not relying on a single leader allows the protocol to maintain non-zero throughput with up to $f$ crashes of any processes out of a total of $n = 2f+1$. The protocol furthermore allows any process to execute a command $c$ fast, in $2$ message delays, provided no more than $e = \lceil\frac{f+1}{2}\rceil$ other processes fail, and all concurrently submitted commands commute with $c$; the latter condition is often satisfied in practical systems. Egalitarian Paxos has served as a foundation for many other replication protocols. But unfortunately, the protocol is very complex, ambiguously specified and suffers from nontrivial bugs. In this paper, we present EPaxos* -- a simpler and correct variant of Egalitarian Paxos. Our key technical contribution is a simpler failure-recovery algorithm, which we have rigorously proved correct. Our protocol also generalizes Egalitarian Paxos to cover the whole spectrum of failure thresholds $f$ and $e$ such that $n \ge \max\{2e+f-1, 2f+1\}$ -- the number of processes that we show to be optimal.
翻译:经典的Paxos等状态机复制协议依赖一个特定的领导者进程来对命令进行排序。然而,这种方法使领导者成为单点故障,并增加了未与其共置的客户端的延迟。针对这些缺陷,平等主义Paxos提出了一种无领导者的替代方案,允许副本通过协作对命令进行排序。不依赖单一领导者使该协议在总进程数$n = 2f+1$中最多$f$个进程崩溃时仍能保持非零吞吐量。此外,只要不超过$e = \lceil\frac{f+1}{2}\rceil$个其他进程发生故障,且所有并发提交的命令与命令$c$可交换(后一条件在实际系统中通常满足),该协议允许任何进程在$2$个消息延迟内快速执行命令$c$。平等主义Paxos已成为众多其他复制协议的基础。但遗憾的是,该协议非常复杂,规范存在模糊性,且存在非平凡缺陷。本文提出EPaxos*——一种更简单且正确的平等主义Paxos变体。我们的核心技术贡献是一个更简洁的故障恢复算法,并已对其正确性进行了严格证明。我们的协议还将平等主义Paxos推广至覆盖满足$n \ge \max\{2e+f-1, 2f+1\}$的全部故障阈值$f$和$e$范围——我们证明该进程数量是最优的。