Suppose we are given a text $T [1..n]$, a straight-line program with $g$ rules for $T$ and an assignment of tags to the characters in $T$ such that the Burrows-Wheeler Transform of $T$ has $r$ runs, the Burrows-Wheeler Transform of the reverse of $T$ has $\bar{r}$ runs and the tag array -- the list of tags in the lexicographic order of the suffixes starting at the characters the tags are assigned to -- has $t$ runs. If the alphabet size is at most polylogarithmic in $n$ then there is an $O (r + \bar{r} + g + t)$-space index for $T$ such that when we are given a pattern $P [1..m]$ we can compute the maximal exact matches (MEMs) of $P$ with respect to $T$ in $O (m)$ time plus $O (\log n)$ time per MEM and then list the distinct tags assigned to the first characters of occurrences of that MEM in constant time per tag listed, all correctly with high probability.
翻译:摘要:假设给定文本$T [1..n]$、其直列程序(包含$g$条规则)以及$T$中字符的标签分配,使得$T$的Burrows-Wheeler变换具有$r$个游程,$T$的反向文本的Burrows-Wheeler变换具有$\bar{r}$个游程,且标签数组(即按后缀字典序排列的标签列表,这些后缀起始于被分配标签的字符位置)具有$t$个游程。若字母表大小至多为$n$的多对数级别,则存在一个$O (r + \bar{r} + g + t)$空间的$T$索引。对于给定模式$P [1..m]$,该索引能以$O (m)$时间外加每个MEM的$O (\log n)$时间计算$P$相对于$T$的最大精确匹配(MEM),并以常数时间列出每个MEM出现位置首字符所分配的互异标签(按列出标签数计),所有结果以高概率正确输出。