Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string $T$, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if $e$ is the number of edges of the CDAWG for string $T$, then the number of new edges added to the CDAWG after a left-end edit operation on $T$ does not exceed $e$. Further, we present a matching lower bound on the sensitivity of CDAWGs for left-end insertions, and almost matching lower bounds for left-end deletions and substitutions. We then generalize our lower-bound instance for left-end insertions to leftward online construction of the CDAWG, and show that it requires $\Omega(n^2)$ time for some string of length $n$.
翻译:紧凑有向无环词图(CDAWG)[Blumer等, 1987]是字符串处理中的基础数据结构,广泛应用于文本模式搜索、数据压缩和模式发现。直观来说,字符串$T$的CDAWG是通过合并该字符串后缀树[Weiner, 1973]的同构子树获得的,因此CDAWG是一种紧凑的索引结构。本文研究了在输入字符串$T$左端执行单个字符编辑操作(插入、删除或替换)时CDAWG的敏感度,即关注左端编辑操作后CDAWG规模的最坏情况增长。我们证明:若$e$是字符串$T$的CDAWG边数,则对$T$执行左端编辑操作后,CDAWG新增的边数不超过$e$。进一步,我们给出了左端插入操作下CDAWG敏感度的匹配下界,以及左端删除和替换操作下近乎匹配的下界。最后,我们将左端插入操作的下界实例推广至CDAWG的向左在线构建,并证明对于某些长度为$n$的字符串,该构建需要$\Omega(n^2)$时间。