Cache-Oblivious Representation of B-Tree Structures

from arxiv, 30 pages + 7 pages of algorithms, 9 figures; changes: paper structure improved, general (sub)tree (re)build added, DFS alg. simplified, build complexity lowered,

We propose a general data structure CORoBTS for storing B-tree-like search trees dynamically in a cache-oblivious way combining the van Emde Boas memory layout with packed memory array. In the use of the vEB layout mostly search complexity was considered, so far. We show the complexity of depth-first search of a subtree and contiguous memory area and provide better insight into the relationship between positions of vertices in tree and in memory. We describe how to build an arbitrary tree in vEB layout if we can simulate its depth-first search. Similarly, we examine batch updates of packed memory array. In CORoBTS, the stored search tree has to satisfy that all leaves are at the same depth and vertices have arity between the chosen constants $a$ and $b$. The data structure allows searching with an optimal I/O complexity $\mathcal{O}(\log_B{N})$ and is stored in linear space. It provides operations for inserting and removing a subtree; both have an amortized I/O complexity $\mathcal{O}(S\cdot(\log^2 N)/B + \log_B N\cdot\log\log S + 1)$ and amortized time complexity $\mathcal{O}(S\cdot\log^2 N)$, where $S$ is the size of the subtree and $N$ the size of the whole stored tree. Rebuilding an existing subtree saves the multiplicative $\mathcal{O}(\log^2 N)$ in both complexities if the number of vertices on individual tree levels is not changed; it is paid only for the inserted/removed vertices otherwise. Modifying cache-oblivious partially persistent array proposed by Davoodi et al. [ESA, pages 296-308. Springer, 2014] to use CORoBTS improves its space complexity from $\mathcal{O}(U^{\log_2 3} + V \log U)$ to $\mathcal{O}(U + V \log U)$, where $U$ is the maximal size of the array and $V$ is the number of versions; the data locality and I/O complexity of both present and persistent reads are kept unchanged; I/O complexity of writes is worsened by a polylogarithmic factor.

翻译：我们提出了一种通用数据结构CORoBTS，用于以缓存无关的方式动态存储类B树搜索树，该方法结合了van Emde Boas内存布局与压缩内存数组。以往在使用vEB布局时主要考虑搜索复杂度。我们展示了子树和连续内存区域的深度优先搜索复杂度，并更深入地揭示了树中顶点位置与内存中位置之间的关系。我们描述了如何在能够模拟深度优先搜索的情况下，在vEB布局中构建任意树。类似地，我们研究了压缩内存数组的批量更新操作。在CORoBTS中，所存储的搜索树需满足所有叶节点位于同一深度，且顶点度数介于选定常数$a$和$b$之间。该数据结构支持具有最优I/O复杂度$\mathcal{O}(\log_B{N})$的搜索操作，并以线性空间存储。它提供了插入和删除子树的操作；两者均具有摊销I/O复杂度$\mathcal{O}(S\cdot(\log^2 N)/B + \log_B N\cdot\log\log S + 1)$和摊销时间复杂度$\mathcal{O}(S\cdot\log^2 N)$，其中$S$为子树大小，$N$为整个存储树的大小。若树中各层顶点数量不变，则重建现有子树可在两种复杂度中节省乘法因子$\mathcal{O}(\log^2 N)$；否则仅需为插入/删除的顶点支付该代价。将Davoodi等人[ESA, pages 296-308. Springer, 2014]提出的缓存无关部分持久化数组修改为使用CORoBTS，可将其空间复杂度从$\mathcal{O}(U^{\log_2 3} + V \log U)$改进为$\mathcal{O}(U + V \log U)$，其中$U$为数组最大尺寸，$V$为版本数量；当前读取和持久化读取的数据局部性与I/O复杂度保持不变；写入操作的I/O复杂度因多对数因子而有所降低。