后缀数组功能的内在权衡解释：字符串问题与前缀范围查询的等价性 (Explaining the Inherent Tradeoffs for Suffix Array Functionality: Equivalences between String Problems and Prefix Range Queries)

We study the fundamental question of how efficiently suffix array entries can be accessed when the array cannot be stored explicitly. The suffix array $SA_T[1..n]$ of a text $T$ of length $n$ encodes the lexicographic order of its suffixes and underlies numerous applications in pattern matching, data compression, and bioinformatics. Previous work established one-way reductions showing how suffix array queries can be answered using, for example, rank queries on the Burrows-Wheeler Transform. More recently, a new class of prefix queries was introduced, together with reductions that, among others, transform a simple tradeoff for prefix-select queries into a suffix array tradeoff matching state-of-the-art space and query-time bounds, while achieving sublinear construction time. For binary texts, the resulting data structure achieves space $O(n)$ bits, preprocessing time $O(n / \sqrt{\log n})$, preprocessing space of $O(n)$ bits, and query time $O(\log^{\epsilon} n)$ for any constant $\epsilon > 0$. However, whether these bounds could be improved using different techniques has remained open. We resolve this question by presenting the first bidirectional reduction showing that suffix array queries are, up to an additive $O(\log\log n)$ term in query time, equivalent to prefix-select queries in all parameters. This result unifies prior approaches and shows that essentially all efficient suffix array representations can be expressed via prefix-select structures. Moreover, we prove analogous equivalences for inverse suffix array queries, pattern ranking, lexicographic range, and SA-interval queries, identifying six core problem pairs that connect string and prefix query models. Our framework thus provides a unified foundation for analyzing and improving the efficiency of fundamental string-processing problems through the lens of prefix queries.

翻译：我们研究了一个基本问题：当后缀数组无法显式存储时，如何高效地访问其条目。长度为n的文本T的后缀数组$SA_T[1..n]$编码了其后缀的字典序，构成了模式匹配、数据压缩和生物信息学中众多应用的基础。先前的工作建立了单向归约，展示了如何通过例如Burrows-Wheeler变换上的秩查询来回答后缀数组查询。最近，一类新的前缀查询被引入，同时提出的归约方法能将前缀选择查询的简单权衡转化为与最先进空间和查询时间界限匹配的后缀数组权衡，并实现亚线性构建时间。对于二进制文本，所得数据结构实现了$O(n)$比特的空间复杂度、$O(n / \sqrt{\log n})$的预处理时间、$O(n)$比特的预处理空间复杂度，以及对任意常数$\epsilon > 0$的$O(\log^{\epsilon} n)$查询时间复杂度。然而，这些界限是否可以通过不同技术改进仍然悬而未决。我们通过提出首个双向归约解决了这个问题，该归约表明后缀数组查询在查询时间上至多存在$O(\log\log n)$的加性项差异，在所有参数上与前缀选择查询等价。这一结果统一了先前的方法，并表明几乎所有高效的后缀数组表示都可以通过前缀选择结构来表达。此外，我们证明了逆后缀数组查询、模式排序、字典序范围查询和SA区间查询的类似等价性，识别了连接字符串模型和前缀查询模型的六个核心问题对。因此，我们的框架通过前缀查询的视角，为分析和改进基础字符串处理问题的效率提供了统一的理论基础。