The issue of internal fragmentation in data structures is a fundamental challenge in database design. A seminal result of Yao in this field shows that evenly splitting the leaves of a B-tree against a workload of uniformly random insertions achieves space utilization of around 69%. However, many database applications perform batched insertions, where a small run of consecutive keys is inserted at a single position. We develop a generalization of Yao's analysis to provide rigorous treatment of such batched workloads. Our approach revisits and reformulates the analytical structure underlying Yao's result in a way that enables generalization and is used to argue that even splitting works well for many workloads in our extended class. For the remaining workloads, we develop simple alternative strategies that provably maintain good space utilization.
翻译:数据结构中的内部碎片化问题是数据库设计中的一个基础性挑战。Yao在该领域的一项开创性结果表明,针对均匀随机插入的工作负载,对B树叶子节点进行均匀分裂可实现约69%的空间利用率。然而,许多数据库应用执行批量插入操作,即在单个位置插入一小段连续键值。我们发展了Yao分析的广义形式,为此类批量工作负载提供严格的理论处理。我们的方法以支持泛化的方式重新审视并重构了Yao结果背后的分析框架,并论证了均匀分裂策略在我们扩展的工作负载类别中仍对多数情况保持良好效果。针对其余类型的工作负载,我们提出了可证明能维持良好空间利用率的简单替代策略。