The palindrome pattern matching (pal-matching) is a kind of generalized pattern matching, in which two strings $x$ and $y$ of same length are considered to match (pal-match) if they have the same palindromic structures, i.e., for any possible $1 \le i < j \le |x| = |y|$, $x[i..j]$ is a palindrome if and only if $y[i..j]$ is a palindrome. The pal-matching problem is the problem of searching for, in a text, the occurrences of the substrings that pal-match with a pattern. Given a text $T$ of length $n$ over an alphabet of size $\sigma$, an index for pal-matching is to support, given a pattern $P$ of length $m$, the counting queries that compute the number $\mathsf{occ}$ of occurrences of $P$ and the locating queries that compute the occurrences of $P$. The authors in~[I et al., Theor. Comput. Sci., 2013] proposed an $O(n \lg n)$-bit data structure to support the counting queries in $O(m \lg \sigma)$ time and the locating queries in $O(m \lg \sigma + \mathsf{occ})$ time. In this paper, we propose an FM-index type index for the pal-matching problem, which we call the PalFM-index, that occupies $2n \lg \min(\sigma, \lg n) + 2n + o(n)$ bits of space and supports the counting queries in $O(m)$ time. The PalFM-indexes can support the locating queries in $O(m + \Delta \mathsf{occ})$ time by adding $\frac{n}{\Delta} \lg n + n + o(n)$ bits of space, where $\Delta$ is a parameter chosen from $\{1, 2, \dots, n\}$ in the preprocessing phase.
翻译:回文模式匹配是一种广义模式匹配,其中两个等长字符串$x$和$y$被视为匹配(回文匹配)当且仅当它们具有相同的回文结构,即对于任意$1 \le i < j \le |x| = |y|$,$x[i..j]$是回文当且仅当$y[i..j]$是回文。回文匹配问题是指在文本中搜索与模式串形成回文匹配的子串出现位置。给定长度为$n$的文本$T$(字母表大小为$\sigma$),回文匹配索引需支持以下查询:给定长度为$m$的模式串$P$,计数查询计算$P$的出现次数$\mathsf{occ}$,定位查询计算$P$的所有出现位置。文献[I et al., Theor. Comput. Sci., 2013]提出了一种$O(n \lg n)$比特的数据结构,支持$O(m \lg \sigma)$时间的计数查询和$O(m \lg \sigma + \mathsf{occ})$时间的定位查询。本文针对回文匹配问题提出一种FM索引型索引——PalFM-index,其空间占用为$2n \lg \min(\sigma, \lg n) + 2n + o(n)$比特,支持$O(m)$时间的计数查询。通过在预处理阶段引入参数$\Delta \in \{1, 2, \dots, n\}$并额外增加$\frac{n}{\Delta} \lg n + n + o(n)$比特空间,PalFM-index可支持$O(m + \Delta \mathsf{occ})$时间的定位查询。