Deterministic Cache-Oblivious Funnelselect

In the multiple-selection problem one is given an unsorted array $S$ of $N$ elements and an array of $q$ query ranks $r_1<\cdots<r_q$, and the task is to return, in sorted order, the $q$ elements in $S$ of rank $r_1, \ldots, r_q$, respectively. The asymptotic deterministic comparison complexity of the problem was settled by Dobkin and Munro [JACM 1981]. In the I/O model an optimal I/O complexity was achieved by Hu et al. [SPAA 2014]. Recently [ESA 2023], we presented a cache-oblivious algorithm with matching I/O complexity, named funnelselect, since it heavily borrows ideas from the cache-oblivious sorting algorithm funnelsort from the seminal paper by Frigo, Leiserson, Prokop and Ramachandran [FOCS 1999]. Funnelselect is inherently randomized as it relies on sampling for cheaply finding many good pivots. In this paper we present deterministic funnelselect, achieving the same optional I/O complexity cache-obliviously without randomization. Our new algorithm essentially replaces a single (in expectation) reversed-funnel computation using random pivots by a recursive algorithm using multiple reversed-funnel computations. To meet the I/O bound, this requires a carefully chosen subproblem size based on the entropy of the sequence of query ranks; deterministic funnelselect thus raises distinct technical challenges not met by randomized funnelselect. The resulting worst-case I/O bound is $O\bigl(\sum_{i=1}^{q+1} \frac{\Delta_i}{B} \cdot \log_{M/B} \frac{N}{\Delta_i} + \frac{N}{B}\bigr)$, where $B$ is the external memory block size, $M\geq B^{1+\epsilon}$ is the internal memory size, for some constant $\epsilon>0$, and $\Delta_i = r_{i} - r_{i-1}$ (assuming $r_0=0$ and $r_{q+1}=N + 1$).

翻译：在多选问题中，给定一个包含$N$个元素的无序数组$S$和$q$个查询秩$r_1<\cdots<r_q$的数组，任务是按排序顺序返回$S$中分别具有秩$r_1, \ldots, r_q$的$q$个元素。该问题的渐近确定性比较复杂度由Dobkin和Munro [JACM 1981]解决。在I/O模型中，Hu等人 [SPAA 2014]实现了最优I/O复杂度。最近[ESA 2023]，我们提出了一种具有匹配I/O复杂度的缓存无关算法，命名为funnelselect，因为它大量借鉴了Frigo、Leiserson、Prokop和Ramachandran [FOCS 1999]开创性论文中缓存无关排序算法funnelsort的思想。Funnelselect本质上是随机化的，因为它依赖采样来廉价地找到许多好的枢轴。在本文中，我们提出确定性funnelselect，无需随机化即可缓存无关地实现相同的最优I/O复杂度。我们的新算法本质上是将使用随机枢轴的单个（期望意义上的）反向漏斗计算替换为使用多个反向漏斗计算的递归算法。为了满足I/O界，这需要基于查询秩序列的熵仔细选择子问题规模；因此，确定性funnelselect提出了随机化funnelselect所未遇到的不同技术挑战。由此产生的最坏情况I/O界为$O\bigl(\sum_{i=1}^{q+1} \frac{\Delta_i}{B} \cdot \log_{M/B} \frac{N}{\Delta_i} + \frac{N}{B}\bigr)$，其中$B$是外部内存块大小，$M\geq B^{1+\epsilon}$是内部内存大小（对于某个常数$\epsilon>0$），且$\Delta_i = r_{i} - r_{i-1}$（假设$r_0=0$且$r_{q+1}=N+1$）。