A parameterized string (p-string) is a string over an alphabet $(\Sigma_{s} \cup \Sigma_{p})$, where $\Sigma_{s}$ and $\Sigma_{p}$ are disjoint alphabets for static symbols (s-symbols) and for parameter symbols (p-symbols), respectively. Two p-strings $x$ and $y$ are said to parameterized match (p-match) if and only if $x$ can be transformed into $y$ by applying a bijection on $\Sigma_{p}$ to every occurrence of p-symbols in $x$. The indexing problem for p-matching is to preprocess a p-string $T$ of length $n$ so that we can efficiently find the occurrences of substrings of $T$ that p-match with a given pattern. Extending the Burrows-Wheeler Transform (BWT) based index for exact string pattern matching, Ganguly et al. [SODA 2017] proposed the first compact index (named pBWT) for p-matching, and posed an open problem on how to construct it in compact space, i.e., in $O(n \lg |\Sigma_{s} \cup \Sigma_{p}|)$ bits of space. Hashimoto et al. [SPIRE 2022] partially solved this problem by showing how to construct some components of pBWTs for $T$ in $O(n \frac{|\Sigma_{p}| \lg n}{\lg \lg n})$ time in an online manner while reading the symbols of $T$ from right to left. In this paper, we improve the time complexity to $O(n \frac{\lg |\Sigma_{p}| \lg n}{\lg \lg n})$. We remark that removing the multiplicative factor of $|\Sigma_{p}|$ from the complexity is of great interest because it has not been achieved for over a decade in the construction of related data structures like parameterized suffix arrays even in the offline setting. We also show that our data structure can support backward search, a core procedure of BWT-based indexes, at any stage of the online construction, making it the first compact index for p-matching that can be constructed in compact space and even in an online manner.
翻译:参数化字符串(p-string)是由字母表 $(\Sigma_{s} \cup \Sigma_{p})$ 构成的字符串,其中 $\Sigma_{s}$ 和 $\Sigma_{p}$ 分别为静态符号(s-symbol)和参数符号(p-symbol)的不相交字母表。两个 p-string $x$ 和 $y$ 被称为参数化匹配(p-match),当且仅当可通过将 $\Sigma_{p}$ 上的双射应用于 $x$ 中所有 p-symbol 的出现位置将 $x$ 转换为 $y$。p-matching 的索引问题要求预处理长度为 $n$ 的 p-string $T$,以便能高效找出 $T$ 中与给定模式发生 p-match 的子串位置。Ganguly 等人 [SODA 2017] 在基于 Burrows-Wheeler 变换(BWT)的精确字符串模式匹配索引基础上,提出了首个用于 p-matching 的紧凑索引(称为 pBWT),并提出了一个开放问题:如何以紧凑空间(即 $O(n \lg |\Sigma_{s} \cup \Sigma_{p}|)$ 比特空间)构建该索引。Hashimoto 等人 [SPIRE 2022] 部分解决了该问题,他们展示了如何以在线方式从右向左读取 $T$ 的符号,在 $O(n \frac{|\Sigma_{p}| \lg n}{\lg \lg n})$ 时间内构建 $T$ 的 pBWT 的某些组件。本文中,我们将时间复杂度改进为 $O(n \frac{\lg |\Sigma_{p}| \lg n}{\lg \lg n})$。我们指出,从复杂度中去除 $|\Sigma_{p}|$ 的乘法因子具有重大意义,因为即使在离线环境下,相关数据结构(如参数化后缀数组)的构建中也已有十多年未实现这一突破。我们还证明,该数据结构在在线构建的任何阶段均支持后向搜索(BWT 索引的核心过程),从而成为首个可在紧凑空间甚至在线方式下构建的 p-matching 紧凑索引。
Alphabet is mostly a collection of companies. This newer Google is a bit slimmed down, with the companies that are pretty far afield of our main internet products contained in Alphabet instead.https://abc.xyz/