This study presents a comprehensive theoretical and empirical analysis of Patricia tries, the fundamental data structure underlying Ethereum's state management system. We develop a probabilistic model characterizing the distribution of path lengths in Patricia tries containing random Ethereum addresses and validate this model through extensive computational experiments. Our findings reveal the logarithmic scaling of average path lengths with respect to the number of addresses, confirming a crucial property for Ethereum's scalability. The study demonstrates high precision in predicting average path lengths, with discrepancies between theoretical and experimental results not exceeding 0.01 across tested scales from 100 to 100,000 addresses. We identify and verify the right-skewed nature of path length distributions, providing insights into worst-case scenarios and informing optimization strategies. Statistical analysis, including chi-square goodness-of-fit tests, strongly supports the model's accuracy. The research offers structural insights into node concentration at specific trie levels, suggesting avenues for optimizing storage and retrieval mechanisms. These findings contribute to a deeper understanding of Ethereum's fundamental data structures and provide a solid foundation for future optimizations. The study concludes by outlining potential directions for future research, including investigations into extreme-scale behavior, dynamic trie performance, and the applicability of the model to non-uniform address distributions and other blockchain systems.
翻译:本研究对帕特里夏树——以太坊状态管理系统的基础数据结构——进行了全面的理论与实证分析。我们建立了一个概率模型,用于描述包含随机以太坊地址的帕特里夏树中路径长度的分布规律,并通过大量计算实验对该模型进行了验证。研究结果表明,平均路径长度相对于地址数量呈对数尺度增长,这证实了对以太坊可扩展性至关重要的一个关键特性。该模型在预测平均路径长度方面展现出高精度,在100至100,000个地址的测试规模范围内,理论与实验结果之间的偏差不超过0.01。我们识别并验证了路径长度分布的右偏特性,为最坏情况分析提供了见解,并为优化策略提供了依据。包括卡方拟合优度检验在内的统计分析有力地支持了模型的准确性。研究进一步揭示了节点在特定树层级上的集中现象,为优化存储与检索机制提供了方向。这些发现深化了对以太坊基础数据结构的理解,并为未来优化工作奠定了坚实基础。研究最后展望了未来潜在的研究方向,包括极端规模下的行为分析、动态树性能研究,以及该模型在非均匀地址分布和其他区块链系统中的适用性探索。