The list-labeling problem is one of the most basic and well-studied algorithmic primitives in data structures, with an extensive literature spanning upper bounds, lower bounds, and data management applications. The classical algorithm for this problem, dating back to 1981, has amortized cost $O(\log^2 n)$. Subsequent work has led to improvements in three directions: \emph{low-latency} (worst-case) bounds; \emph{high-throughput} (expected) bounds; and (adaptive) bounds for \emph{important workloads}. Perhaps surprisingly, these three directions of research have remained almost entirely disjoint -- this is because, so far, the techniques that allow for progress in one direction have forced worsening bounds in the others. Thus there would appear to be a tension between worst-case, adaptive, and expected bounds. List labeling has been proposed for use in databases at least as early as PODS'99, but a database needs good throughput, response time, and needs to adapt to common workloads (e.g., bulk loads), and no current list-labeling algorithm achieve good bounds for all three. We show that this tension is not fundamental. In fact, with the help of new data-structural techniques, one can actually \emph{combine} any three list-labeling solutions in order to cherry-pick the best worst-case, adaptive, and expected bounds from each of them.
翻译:列表标记问题是数据结构中最基础且研究最充分的算法原语之一,其文献广泛涵盖上界、下界及数据管理应用。该问题的经典算法可追溯至1981年,其平摊代价为$O(\log^2 n)$。后续研究在三个方向取得了进展:低延迟(最坏情况)界、高吞吐量(期望)界,以及针对重要工作负载的(自适应)界。令人惊讶的是,这三个研究方向几乎完全互不关联——原因在于,目前允许某一方向取得进展的技术必然迫使其他方向的界恶化。因此,在最坏情况界、自适应界与期望界之间似乎存在一种张力。列表标记至少早在PODS'99就被提议用于数据库,但数据库需要良好的吞吐量、响应时间,并需适应常见工作负载(例如批量加载),而现有列表标记算法无法在这三方面均达到良好的界。我们证明这种张力并非本质性的。事实上,借助新的数据结构技术,我们能够将任意三种列表标记解决方案相结合,从而从每种方案中挑选最佳的最坏情况、自适应和期望界。