Compact directed acyclic word graphs (CDAWGs) [Blumer et al. 1987] are a fundamental data structure on strings with applications in text pattern searching, data compression, and pattern discovery. Intuitively, the CDAWG of a string $T$ is obtained by merging isomorphic subtrees of the suffix tree [Weiner 1973] of the same string $T$, thus CDAWGs are a compact indexing structure. In this paper, we investigate the sensitivity of CDAWGs when a single character edit operation (insertion, deletion, or substitution) is performed at the left-end of the input string $T$, namely, we are interested in the worst-case increase in the size of the CDAWG after a left-end edit operation. We prove that if $e$ is the number of edges of the CDAWG for string $T$, then the number of new edges added to the CDAWG after a left-end edit operation on $T$ is less than $e$. Further, we present almost matching lower bounds on the sensitivity of CDAWGs for all cases of insertion, deletion, and substitution.
翻译:紧凑有向无环词图(CDAWG)[Blumer 等人,1987]是一种基础字符串数据结构,广泛应用于文本模式搜索、数据压缩和模式发现等领域。直观而言,字符串 $T$ 的 CDAWG 是通过合并同一字符串 $T$ 的后缀树 [Weiner,1973] 中同构子树得到的,因此 CDAWG 是一种紧凑索引结构。本文研究了当在输入字符串 $T$ 左端执行单个字符编辑操作(插入、删除或替换)时,CDAWG 的敏感性,即我们关注左端编辑操作后 CDAWG 最坏情况下的规模增长。我们证明,若 $e$ 为字符串 $T$ 的 CDAWG 的边数,则对 $T$ 执行左端编辑操作后,CDAWG 新增的边数小于 $e$。此外,我们针对插入、删除和替换所有情况,给出了几乎匹配的 CDAWG 敏感性下界。