Merkle hash trees are the state-of-the-art method to protect the integrity of storage systems. However, using a hash tree can severely degrade performance, and prior works optimizing them have yet to yield a concrete understanding of the scalability of certain designs in the context of large-scale cloud storage systems. In this paper, we take a first-principles approach to analyzing hash tree performance for storage by introducing a definition of an optimal hash tree and a principled methodology for evaluating hash tree designs. We show that state-of-the-art designs are not scalable; they incur up to 40.1X slowdowns over an insecure baseline and deliver <50% of optimal performance across various experiments. We then exploit the characteristics of optimal hash trees to design Dynamic Hash Trees (DHTs), hash trees that can adapt to workload patterns on-the-fly, delivering >95% of optimal read and write performance and up to 4.2X speedups over the state-of-the art. Our novel methodology and DHT design provides a new foundation in the search for integrity mechanisms that can operate efficiently at scale.
翻译:Merkle哈希树是保护存储系统完整性的最先进方法。然而,使用哈希树会严重降低性能,先前优化哈希树的研究尚未在大型云存储系统背景下对特定设计的可扩展性形成具体理解。本文采用第一性原理方法分析存储哈希树性能,通过引入最优哈希树的定义和评估哈希树设计的系统化方法论。我们证明现有最先进设计不具备可扩展性:相较于不安全基线,它们会产生高达40.1倍的性能下降,并在各类实验中表现低于最优性能的50%。随后,我们利用最优哈希树的特性设计了动态哈希树(DHT),这种哈希树能够实时适应工作负载模式,实现超过95%的最优读写性能,并较现有最优方案获得高达4.2倍的加速。我们提出的创新方法论和DHT设计为探索可大规模高效运行的完整性机制奠定了新基础。