An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of minimal length. We show a series of combinatorial and algorithmic results regarding these two notions. For instance: we give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed.
翻译:字符串 $w$ 的缺失因子 $u$ 是未作为连续子串(即因子)出现在 $w$ 中的字符串。我们将这一被广泛研究的概念进行拓展,并定义缺失子序列:若字符串 $u$ 未作为子序列(即散布因子)出现在字符串 $w$ 中,则称 $u$ 为 $w$ 的缺失子序列。我们特别关注两类缺失子序列:极小缺失子序列(即所有子序列均非缺失的缺失子序列)和最短缺失子序列(即长度最小的缺失子序列)。我们围绕这两个概念展示了一系列组合与算法方面的成果。例如:我们给出了单词中极小缺失子序列集与最短缺失子序列集的组合刻画,以及这些集的紧凑表示;展示了如何高效检验一个字符串是否为单词中的最短或极小缺失子序列,并给出了计算每种类型字典序最小缺失子序列的高效算法;此外,我们还证明了如何高效计算一种数据结构,用于回答给定字符串各因子的最短缺失子序列查询。