Blockchain provides a decentralized and tamper-resistant ledger for securely recording transactions across a network of untrusted nodes. While its transparency and integrity are beneficial, the substantial storage requirements for maintaining a complete transaction history present significant challenges. For example, Ethereum nodes require around 23TB of storage, with an annual growth rate of 4TB. Prior studies have employed various strategies to mitigate the storage challenges. Notably, COLE significantly reduces storage size and improves throughput by adopting a column-based design that incorporates a learned index, effectively eliminating data duplication in the storage layer. However, this approach has limitations in supporting chain reorganization during blockchain forks and state pruning to minimize storage overhead. In this paper, we propose COLE$^+$, an enhanced storage solution designed to address these limitations. COLE$^+$ incorporates a novel rewind-supported in-memory tree structure for handling chain reorganization, leveraging content-defined chunking (CDC) to maintain a consistent hash digest for each block. For on-disk storage, a new two-level Merkle Hash Tree (MHT) structure, called prunable version tree, is developed to facilitate efficient state pruning. Both theoretical and empirical analyses show the effectiveness of COLE$^+$ and its potential for practical application in real-world blockchain systems.
翻译:区块链为不可信节点网络中的交易记录提供了一个去中心化且防篡改的账本。尽管其透明性和完整性具有优势,但维护完整交易历史所产生的巨大存储需求带来了显著挑战。例如,以太坊节点需要约23TB的存储空间,且每年以4TB的速度增长。先前的研究采用了多种策略来缓解存储挑战。值得注意的是,COLE通过采用基于列的设计并融合习得索引,显著减少了存储空间并提升了吞吐量,有效消除了存储层的数据冗余。然而,该方法在支持区块链分叉期间的链重组以及为最小化存储开销而进行的状态剪枝方面存在局限。本文提出COLE$^+$,一种增强的存储解决方案,旨在解决这些局限。COLE$^+$引入了一种新颖的支持回滚的内存树结构来处理链重组,利用内容定义分块(CDC)为每个区块维护一致的哈希摘要。对于磁盘存储,开发了一种新的两级默克尔哈希树(MHT)结构,称为可剪枝版本树,以支持高效的状态剪枝。理论和实证分析均表明COLE$^+$的有效性及其在实际区块链系统中应用的潜力。