It is well-known that checking whether a given string $w$ matches a given regular expression $r$ can be done in quadratic time $O(|w|\cdot |r|)$ and that this cannot be improved to a truly subquadratic running time of $O((|w|\cdot |r|)^{1-ε})$ assuming the strong exponential time hypothesis (SETH). We study the related problem that asks whether $w$ has a \emph{subsequence} that matches $r$, and we show that surprisingly this task admits an algorithm that runs in linear time, i.e., in $O(|w| + |r|)$. We further show that the same holds if we ask for a supersequence instead of a subsequence. Moreover, we show that the \emph{quantitative} problems of computing a longest subsequence or shortest supersequence of $w$ that matches $r$ can be solved with the same complexity as the classical longest common subsequence or shortest common supersequence problems, i.e., in $O(|w|\cdot |r|)$, and conditionally not in $O((|w|\cdot|r|)^{1 - ε})$. By contrast, if instead of subsequences or supersequences we consider other string relations like the infix, prefix, left-extension, or extension relations, then all the corresponding problems (both quantitative and non-quantitative) have the same complexity as classical regex matching, i.e., they can also be solved in $O(|w|\cdot |r|)$, but not in $O((|w|\cdot|r|)^{1 - ε})$ assuming SETH. We last study the complexity of the \emph{universal} problem that asks if \emph{all} subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression. For these problems, we show polynomial upper bounds (along with matching conditional lower bounds) for the infix and prefix relations, but PSPACE-completeness for the extension, left-extension and supersequence relations, and coNP-completeness for the subsequence relation.
翻译:众所周知,检查给定字符串 $w$ 是否匹配给定正则表达式 $r$ 可在二次时间 $O(|w|\cdot |r|)$ 内完成,并且假设强指数时间假说(SETH)成立,此时间无法改进至真正次二次运行时间 $O((|w|\cdot |r|)^{1-ε})$。我们研究询问 $w$ 是否存在匹配 $r$ 的\emph{子序列}的相关问题,并出人意料地表明该任务存在线性时间算法,即运行时间为 $O(|w| + |r|)$。我们进一步证明,若将子序列替换为超序列,同样成立。此外,我们表明计算 $w$ 中匹配 $r$ 的最长子序列或最短超序列的\emph{定量}问题,其求解复杂度与经典最长公共子序列或最短公共超序列问题相同,即 $O(|w|\cdot |r|)$,且条件性地不可能在 $O((|w|\cdot|r|)^{1 - ε})$ 内完成。相比之下,若将子序列或超序列替换为其他字符串关系(如中缀、前缀、左扩展或扩展关系),则所有对应问题(包括定量与非定量)均具有与经典正则匹配相同的复杂度,即可在 $O(|w|\cdot |r|)$ 内求解,但假设SETH成立,无法在 $O((|w|\cdot|r|)^{1 - ε})$ 内完成。最后,我们研究询问输入字符串的\emph{所有}子序列(或超序列、中缀、前缀、左扩展或扩展)是否满足给定正则表达式的\emph{全域}问题的复杂度。对于这些问题,我们给出了中缀和前缀关系的多项式上界(以及匹配的条件性下界),而扩展、左扩展和超序列关系为PSPACE完全性,子序列关系为coNP完全性。