压缩与动态文本中的对数时间内部模式匹配查询 (Logarithmic-Time Internal Pattern Matching Queries in Compressed and Dynamic Texts)

Internal Pattern Matching (IPM) queries on a text $T$, given two fragments $X$ and $Y$ of $T$ such that $|Y|<2|X|$, ask to compute all exact occurrences of $X$ within $Y$. IPM queries have been introduced by Kociumaka, Radoszewski, Rytter, and Wale\'n [SODA'15 & SICOMP'24], who showed that they can be answered in $O(1)$ time using a data structure of size $O(n)$ and used this result to answer various queries about fragments of $T$. In this work, we study IPM queries on compressed and dynamic strings. Our result is an $O(\log n)$-time query algorithm applicable to any balanced recompression-based run-length straight-line program (RLSLP). In particular, one can use it on top of the RLSLP of Kociumaka, Navarro, and Prezza [IEEE TIT'23], whose size $O\big(\delta \log \frac{n\log \sigma}{\delta \log n}\big)$ is optimal (among all text representations) as a function of the text length $n$, the alphabet size $\sigma$, and the substring complexity $\delta$. Our procedure does not rely on any preprocessing of the underlying RLSLP, which makes it readily applicable on top of the dynamic strings data structure of Gawrychowski, Karczmarz, Kociumaka, {\L}\k{a}cki and Sankowski [SODA'18], which supports fully persistent updates in logarithmic time with high probability.

翻译：内部模式匹配（IPM）查询针对文本 $T$，给定 $T$ 的两个片段 $X$ 和 $Y$（满足 $|Y|<2|X|$），要求计算 $X$ 在 $Y$ 中的所有精确出现位置。IPM 查询由 Kociumaka、Radoszewski、Rytter 和 Waleń [SODA'15 & SICOMP'24] 提出，他们证明了通过构建大小为 $O(n)$ 的数据结构，可在 $O(1)$ 时间内回答此类查询，并利用该结果解决了关于 $T$ 片段的各种查询问题。本文研究压缩与动态字符串上的 IPM 查询。我们的成果是一种适用于任何基于平衡重压缩的游程长度直线程序（RLSLP）的 $O(\log n)$ 时间查询算法。特别地，该算法可应用于 Kociumaka、Navarro 和 Prezza [IEEE TIT'23] 提出的 RLSLP，其大小 $O\big(\delta \log \frac{n\log \sigma}{\delta \log n}\big)$ 作为文本长度 $n$、字母表大小 $\sigma$ 和子串复杂度 $\delta$ 的函数是最优的（在所有文本表示方法中）。我们的算法不依赖底层 RLSLP 的任何预处理，因此可直接应用于 Gawrychowski、Karczmarz、Kociumaka、Łącki 和 Sankowski [SODA'18] 提出的动态字符串数据结构之上，该结构以高概率支持对数时间的完全持久化更新。

相关内容

IPM

关注 15

信息处理和管理（IPM）在计算机与信息科学的交叉点上发布了有关领域，包括但不限于商业、市场营销、广告、社交计算和信息技术等领域的理论、方法或应用的前沿研究。该杂志的目的是通过为及时传播高级和热门问题提供有效的论坛，从而在计算机与信息科学的交叉点上增进研究人员和从业人员的利益。该期刊对原始研究文章、研究调查文章、研究方法文章以及涉及研究关键应用的文章特别感兴趣。官网地址：http://dblp.uni-trier.de/db/journals/ipm/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日