In this paper we provide a new locally consistent decomposition of strings. Each string $x$ is decomposed into blocks that can be described by grammars of size $\widetilde{O}(k)$ (using some amount of randomness). If we take two strings $x$ and $y$ of edit distance at most $k$ then their block decomposition uses the same number of grammars and the $i$-th grammar of $x$ is the same as the $i$-th grammar of $y$ except for at most $k$ indexes $i$. The edit distance of $x$ and $y$ equals to the sum of edit distances of pairs of blocks where $x$ and $y$ differ. Our decomposition can be used to design a sketch of size $\widetilde{O}(k^2)$ for edit distance, and also a rolling sketch for edit distance of size $\widetilde{O}(k^2)$. The rolling sketch allows to update the sketched string by appending a symbol or removing a symbol from the beginning of the string.
翻译:本文提出了一种新的字符串局部一致分解方法。每个字符串$x$被分解为可用大小为$\widetilde{O}(k)$的语法(通过一定随机性)描述的块。若两个字符串$x$和$y$的编辑距离至多为$k$,则它们的块分解使用相同数量的语法,且$x$的第$i$个语法与$y$的第$i$个语法相同,除最多$k$个索引$i$外。$x$和$y$的编辑距离等于$x$与$y$不同之处对应的块对编辑距离之和。我们的分解可用于设计大小为$\widetilde{O}(k^2)$的编辑距离草图,以及大小为$\widetilde{O}(k^2)$的编辑距离滚动草图。该滚动草图允许通过向字符串末尾追加符号或从开头删除符号来更新被草绘的字符串。