The regular expression matching problem asks whether a given regular expression of length $m$ matches a given string of length $n$. As is well known, the problem can be solved in $O(nm)$ time using Thompson's algorithm. Moreover, recent studies have shown that regular expression matching extended with a practical extension called lookaround can be solved in the same time complexity. In this work, we consider four well-known extensions to regular expressions called backreference, squaring, intersection and complement. We prove a number of novel time complexity lower bounds for regular expression matching with these extensions under the Orthogonal Vectors Conjecture (OVC), $k$-OVC, $k$-Clique Hypothesis, and Combinatorial $k$-Clique Hypothesis. Some highlights of our results include the fact that none of the matching problems with the extensions can be solved in $n^{2-\varepsilon} \mathrm{poly}(m)$ time for any constant $\varepsilon > 0$ (for backreference, even when restricted to one capturing group) under OVC, and that the problem with complement, also known as extended regular expression (ERE) matching, cannot be solved in time $n^{2-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$ under OVC, $n^{ω-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$ under the $k$-Clique Hypothesis (where $ω$ is the matrix multiplication exponent), and $n^{3-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$ under the Combinatorial $k$-Clique Hypothesis, respectively. In particular, the latter two results show that the $O(n^3 m)$-time ERE matching algorithm introduced by Hopcroft and Ullman in 1979 and recently improved by Bille, Gørtz and Jessen to run in $O(n^ωm)$ time using fast matrix multiplication was already optimal in a sense, and shed light on why the theoretical computer science community has struggled to improve the time complexity of ERE matching with respect to $n$ and $m$ for more than 45 years.
翻译:暂无翻译