In this work, we consider pattern matching variants in small space, that is, in the read-only setting, where we want to bound the space usage on top of storing the strings. Our main contribution is a space-time trade-off for the Internal Pattern Matching (IPM) problem, where the goal is to construct a data structure over a string $S$ of length $n$ that allows one to answer the following type of queries: Compute the occurrences of a fragment $P$ of $S$ inside another fragment $T$ of $S$, provided that $|T| < 2|P|$. For any $\tau \in [1 .. n/\log^2 n]$, we present a nearly-optimal $\~O(n/\tau)$-size data structure that can be built in $\~O(n)$ time using $\~O(n/\tau)$ extra space, and answers IPM queries in $O(\tau+\log n \log^3 \log n)$ time. IPM queries have been identified as a crucial primitive operation for the analysis of algorithms on strings. In particular, the complexities of several recent algorithms for approximate pattern matching are expressed with regards to the number of calls to a small set of primitive operations that include IPM queries; our data structure allows us to port these results to the small-space setting. We further showcase the applicability of our IPM data structure by using it to obtain space-time trade-offs for the longest common substring and circular pattern matching problems in the asymmetric streaming setting.
翻译:本文研究小空间下的模式匹配变体问题,即只读设置中,要求在存储字符串的基础上严格控制空间使用。我们的主要贡献是针对内部模式匹配(IPM)问题提出一种时空权衡方案,其目标是构建一个基于长度为 $n$ 的字符串 $S$ 的数据结构,支持回答以下类型查询:计算 $S$ 的片段 $P$ 在另一片段 $T$ 中的出现位置,前提是 $|T| < 2|P|$。对于任意 $\tau \in [1 .. n/\log^2 n]$,我们提出一个近乎最优的 $\~O(n/\tau)$ 大小数据结构,可在 $\~O(n)$ 时间内构建且仅使用 $\~O(n/\tau)$ 额外空间,并以 $O(\tau+\log n \log^3 \log n)$ 时间回答 IPM 查询。IPM 查询已被识别为字符串算法分析的关键原语操作。特别地,近期多个近似模式匹配算法的复杂度均以包含 IPM 查询的小型原语操作集的调用次数表示;我们的数据结构使得这些结果可迁移至小空间设置。我们进一步通过不对称流设置中最长公共子串和圆形模式匹配问题的时空权衡求解,展示了 IPM 数据结构的应用价值。