A sorted set (or map) is one of the most used data types in computer science. In addition to standard set operations, like Insert, Remove, and Contains, it can provide set-set operations such as Union,Intersection, and Difference. Each of these set-set operations is equivalent to some batched operation: the data structure should be able to execute Insert, Remove, and Contains on a batch of keys. It is obvious that we want these "large" operations to be parallelized. These sets are usually implemented with the trees of logarithmic height, such as 2-3 trees, treaps, AVL trees, red-black trees, etc. Until now, little attention was devoted to data structures that work asymptotically better under several restrictions on the stored data. In this work, we parallelize Interpolation Search Tree which is expected to serve requests from a smooth distribution in doubly-logarithmic time. Our data structure of size n performs a batch of m operations in O(m log log n) work and poly-log span.
翻译:有序集合(或映射)是计算机科学中最常用的数据类型之一。除标准的集合操作(如插入、删除和包含查询)外,它还能提供集合间的操作,例如并集、交集和差集。每种集合间操作都等价于某种批量操作:数据结构应能对一批键执行插入、删除和包含查询操作。显然,我们希望这些"大规模"操作能够并行化。这类集合通常借助对数高度树结构实现,例如2-3树、树堆、AVL树、红黑树等。迄今为止,针对在存储数据的特定限制下具有渐进优化性能的数据结构研究较少。本文对插值搜索树进行了并行化改进,该结构期望从平滑分布中在双对数时间内响应请求。我们提出的数据结构规模为n,能够以O(m log log n)的工作量和多项对数级并行度完成m个操作的批量处理。