An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string $w$ if $u$ does not occur as subsequence (a.k.a. scattered factor) inside $w$. Of particular interest to us are minimal absent subsequences, i.e., absent subsequences whose every subsequence is not absent, and shortest absent subsequences, i.e., absent subsequences of minimal length. We show a series of combinatorial and algorithmic results regarding these two notions. For instance: we give combinatorial characterisations of the sets of minimal and, respectively, shortest absent subsequences in a word, as well as compact representations of these sets; we show how we can test efficiently if a string is a shortest or minimal absent subsequence in a word, and we give efficient algorithms computing the lexicographically smallest absent subsequence of each kind; also, we show how a data structure for answering shortest absent subsequence-queries for the factors of a given string can be efficiently computed.
翻译:字符串$w$的缺席因子是指不在$w$中作为连续子串(即因子)出现的字符串$u$。我们将这一经典概念进行扩展,定义了缺席子序列:若字符串$u$未作为$w$的子序列(即分散因子)出现,则称$u$是$w$的缺席子序列。特别关注两类对象:最小缺席子序列(其所有子序列均非缺席子序列)和最短缺席子序列(长度最小的缺席子序列)。我们围绕这两类概念得到一系列组合与算法结果。例如:给出单词中最小缺席子序列集与最短缺席子序列集的组合刻画及紧凑表示;展示如何高效检验一个字符串是否为单词的最短或最小缺席子序列,并给出计算每种类型词典序最小缺席子序列的高效算法;此外,还证明可高效构建数据结构以回答给定字符串各因子的最短缺席子序列查询。