Pilotfish: Distributed Transaction Execution for Lazy Blockchains

Pilotfish is the first scale-out blockchain execution engine able to harness any degree of parallelizability existing in its workload. Pilotfish allows each validator to employ multiple machines, named ExecutionWorkers, under its control to scale its execution layer. Given a sufficiently parallelizable and compute-intensive load, the number of transactions that the validator can execute increases linearly with the number of ExecutionWorkers at its disposal. In addition, Pilotfish maintains the consistency of the state, even when many validators experience simultaneous machine failures. This is possible due to the meticulous co-design of our crash-recovery protocol which leverages the existing fault tolerance in the blockchain's consensus mechanism. Finally, Pilotfish can also be seen as the first distributed deterministic execution engine that provides support for dynamic reads as transactions are not required to provide a fully accurate read and write set. This loosening of requirements would normally reduce the parallelizability available by blocking write-after-write conflicts, but our novel versioned-queues scheduling algorithm circumvents this by exploiting the lazy recovery property of Pilotfish, which only persists consistent state and re-executes any optimistic steps taken before the crash. In order to prove our claims we implemented the common path of Pilotfish with support for the MoveVM and evaluated it against the parallel execution MoveVM of Sui. Our results show that our simpler scheduling algorithms outperforms Sui even with a single execution worker, but more importantly provides linear scalability up to 4 ExecutionWorkers even for simple asset-transfers and to any number of ExecutionWorkers for more computationally heavy workloads.

翻译：摘要：Pilotfish是首个能够充分利用工作负载中任意程度并行性的可扩展区块链执行引擎。Pilotfish允许每个验证者在其控制下部署多台机器（称为ExecutionWorker）来扩展其执行层。在充分可并行化且计算密集型负载条件下，验证者能够执行的事务数量随其可调用的ExecutionWorker数量线性增长。此外，即便多个验证者同时遭遇机器故障，Pilotfish仍能维持状态一致性。这得益于我们精心设计的崩溃恢复协议，该协议充分利用了区块链共识机制中已有的容错能力。最后，Pilotfish可被视为首个支持动态读取的分布式确定性执行引擎——事务无需提供完全精确的读写集。这种要求宽松化通常会导致写后写冲突阻碍并行化，但我们创新的版本化队列调度算法通过利用Pilotfish的惰性恢复特性规避了这一问题：该算法仅持久化一致状态，并在崩溃后重新执行任何乐观步骤。为验证上述论断，我们实现了Pilotfish的通用路径并集成MoveVM支持，与Sui的并行执行MoveVM进行了性能对比评估。结果表明，即使仅使用单个执行工作节点，我们更简化的调度算法性能仍优于Sui；更重要的是，对于简单的资产转移场景，该算法可在4个ExecutionWorker范围内实现线性扩展，而对于计算密集型负载，其线性扩展能力可扩展至任意数量的ExecutionWorker。