In this paper we give an algorithm for streaming $k$-edit approximate pattern matching which uses space $\widetilde{O}(k^2)$ and time $\widetilde{O}(k^2)$ per arriving symbol. This improves substantially on the recent algorithm of Kociumaka, Porat and Starikovskaya (2022) which uses space $\widetilde{O}(k^5)$ and time $\widetilde{O}(k^8)$ per arriving symbol. In the $k$-edit approximate pattern matching problem we get a pattern $P$ and text $T$ and we want to identify all substrings of the text $T$ that are at edit distance at most $k$ from $P$. In the streaming version of this problem both the pattern and the text arrive in a streaming fashion symbol by symbol and after each symbol of the text we need to report whether there is a current suffix of the text with edit distance at most $k$ from $P$. We measure the total space needed by the algorithm and time needed per arriving symbol.
翻译:本文提出了一种流式$k$编辑近似模式匹配算法,该算法每个到达符号的空间复杂度为$\widetilde{O}(k^2)$,时间复杂度为$\widetilde{O}(k^2)$。这一结果显著改进了Kociumaka、Porat和Starikovskaya(2022)近期提出的算法,后者每个到达符号的空间复杂度为$\widetilde{O}(k^5)$,时间复杂度为$\widetilde{O}(k^8)$。在$k$编辑近似模式匹配问题中,我们给定模式$P$和文本$T$,需要找出文本$T$中所有与$P$的编辑距离不超过$k$的子串。在该问题的流式版本中,模式与文本均以符号逐个到达的方式流式输入,且每接收到文本的一个符号,就需要报告当前文本后缀中是否存在与$P$编辑距离不超过$k$的子串。我们度量算法所需的总空间以及每个到达符号所需的处理时间。