An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of minimal length. We show a series of combinatorial and algorithmic results regarding these two notions. For instance: we give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed.
翻译:字符串$w$的缺席因子是指不作为$w$的连续子串(即因子)出现的字符串$u$。我们将这一已有深入研究的概念进行推广,定义了缺席子序列:若字符串$u$不作为$w$的子序列(即散列因子)出现,则称$u$为字符串$w$的缺席子序列。我们特别关注两类对象:最小缺席子序列(其所有子序列均非缺席子序列)和最短缺席子序列(长度最小的缺席子序列)。针对这两个概念,我们给出了一系列组合与算法结果。例如:我们给出了词中最小缺席子序列集合与最短缺席子序列集合的组合刻画及这些集合的紧凑表示;展示了如何高效判定一个字符串是否为词中的最短或最小缺席子序列,并给出了计算各类别字典序最小缺席子序列的高效算法;此外,我们还论证了如何高效构建用于回答给定字符串中因子最短缺席子序列查询的数据结构。