QuadRank: Engineering a High Throughput Rank

Given a text, a query $\mathsf{rank}(q, c)$ counts the number of occurrences of character $c$ among the first $q$ characters of the text. Space-efficient methods to answer these rank queries form an important building block in many succinct data structures. For example, the FM-index is a widely used data structure that uses rank queries to locate all occurrences of a pattern in a text. In bioinformatics applications, the goal is usually to process a given input as fast as possible. Thus, data structures should have high throughput when used with many threads. Contributions. For the binary alphabet, we develop BiRank with 3.28% space overhead. It merges the central ideas of two recent papers: (1) we interleave (inline) offsets in each cache line of the underlying bit vector [Laws et al., 2024], reducing cache-misses, and (2) these offsets are to the middle of each block so that only half of them need popcounting [Gottlieb and Reinert, 2025]. In QuadRank (14.4% space overhead), we extend these techniques to the $σ=4$ (DNA) alphabet. Both data structures require only a single cache miss per query, making them highly suitable for high-throughput and memory-bound settings. To enable efficient batch-processing, we support prefetching the cache lines required to answer upcoming queries. Results. BiRank and QuadRank are around $1.5\times$ and $2\times$ faster than similar-overhead methods that do not use inlining. Prefetching gives an additional $2\times$ speedup, at which point the dual-channel DDR4 RAM bandwidth becomes a hard limit on the total throughput. With prefetching, both methods outperform all other methods apart from SPIDER [Laws et al., 2024] by $2\times$. When using QuadRank with prefetching in a toy count-only FM-index, QuadFm, this results in a smaller size and up to $4\times$ speedup over Genedex, a state-of-the-art batching FM-index implementation.

翻译：给定一个文本，查询 $\mathsf{rank}(q, c)$ 用于统计字符 $c$ 在文本前 $q$ 个字符中出现的次数。能够高效回答此类秩查询的空间节省方法，是许多简洁数据结构的重要组成部分。例如，FM-index 是一种广泛应用的数据结构，它利用秩查询来定位文本中模式的所有出现位置。在生物信息学应用中，目标通常是尽可能快地处理给定的输入。因此，当使用多线程时，数据结构应具备高吞吐量。**贡献**。针对二进制字母表，我们开发了空间开销为 3.28% 的 BiRank。它融合了两篇近期论文的核心思想：(1) 我们在底层位向量的每个缓存行中交错（内联）存储偏移量 [Laws et al., 2024]，从而减少缓存未命中；(2) 这些偏移量指向每个块的中间位置，因此只需对其中一半进行 popcount 操作 [Gottlieb and Reinert, 2025]。在 QuadRank（14.4% 空间开销）中，我们将这些技术扩展到 $σ=4$（DNA）字母表。这两种数据结构每次查询仅需一次缓存未命中，使其非常适用于高吞吐量和内存受限的场景。为了实现高效的批处理，我们支持预取回答后续查询所需的缓存行。**结果**。BiRank 和 QuadRank 比不使用内联技术的、具有类似空间开销的方法分别快约 $1.5\times$ 和 $2\times$。预取技术带来了额外的 $2\times$ 加速，此时双通道 DDR4 RAM 的带宽成为总吞吐量的硬性限制。在使用预取的情况下，这两种方法比除 SPIDER [Laws et al., 2024] 之外的所有其他方法快 $2\times$。当在一个仅计数的玩具 FM-index（QuadFm）中使用带预取的 QuadRank 时，相比于最先进的批处理 FM-index 实现 Genedex，其尺寸更小且速度提升高达 $4\times$。