Internal Pattern Matching (IPM) queries on a text $T$, given two fragments $X$ and $Y$ of $T$ such that $|Y|<2|X|$, ask to compute all exact occurrences of $X$ within $Y$. IPM queries have been introduced by Kociumaka, Radoszewski, Rytter, and Wale\'n [SODA'15 & SICOMP'24], who showed that they can be answered in $O(1)$ time using a data structure of size $O(n)$ and used this result to answer various queries about fragments of $T$. In this work, we study IPM queries on compressed and dynamic strings. Our result is an $O(\log n)$-time query algorithm applicable to any balanced recompression-based run-length straight-line program (RLSLP). In particular, one can use it on top of the RLSLP of Kociumaka, Navarro, and Prezza [IEEE TIT'23], whose size $O\big(\delta \log \frac{n\log \sigma}{\delta \log n}\big)$ is optimal (among all text representations) as a function of the text length $n$, the alphabet size $\sigma$, and the substring complexity $\delta$. Our procedure does not rely on any preprocessing of the underlying RLSLP, which makes it readily applicable on top of the dynamic strings data structure of Gawrychowski, Karczmarz, Kociumaka, {\L}\k{a}cki and Sankowski [SODA'18], which supports fully persistent updates in logarithmic time with high probability.
翻译:内部模式匹配(IPM)查询针对文本 $T$,给定 $T$ 的两个片段 $X$ 和 $Y$(满足 $|Y|<2|X|$),要求计算 $X$ 在 $Y$ 中的所有精确出现位置。IPM 查询由 Kociumaka、Radoszewski、Rytter 和 Waleń [SODA'15 & SICOMP'24] 提出,他们证明了通过构建大小为 $O(n)$ 的数据结构,可在 $O(1)$ 时间内回答此类查询,并利用该结果解决了关于 $T$ 片段的各种查询问题。本文研究压缩与动态字符串上的 IPM 查询。我们的成果是一种适用于任何基于平衡重压缩的游程长度直线程序(RLSLP)的 $O(\log n)$ 时间查询算法。特别地,该算法可应用于 Kociumaka、Navarro 和 Prezza [IEEE TIT'23] 提出的 RLSLP,其大小 $O\big(\delta \log \frac{n\log \sigma}{\delta \log n}\big)$ 作为文本长度 $n$、字母表大小 $\sigma$ 和子串复杂度 $\delta$ 的函数是最优的(在所有文本表示方法中)。我们的算法不依赖底层 RLSLP 的任何预处理,因此可直接应用于 Gawrychowski、Karczmarz、Kociumaka、Łącki 和 Sankowski [SODA'18] 提出的动态字符串数据结构之上,该结构以高概率支持对数时间的完全持久化更新。