The regular expression matching problem asks whether a given regular expression of length $m$ matches a given string of length $n$. As is well known, the problem can be solved in $O(nm)$ time using Thompson's algorithm. Moreover, recent studies have shown that regular expression matching extended with a practical extension called lookaround can be solved in the same time complexity. In this work, we consider four well-known extensions to regular expressions called backreference, squaring, intersection and complement. We prove a number of novel time complexity lower bounds for regular expression matching with these extensions under the Orthogonal Vectors Conjecture (OVC), $k$-OVC, $k$-Clique Hypothesis, and Combinatorial $k$-Clique Hypothesis. Some highlights of our results include the fact that none of the matching problems with the extensions can be solved in $n^{2-\varepsilon} \mathrm{poly}(m)$ time for any constant $\varepsilon > 0$ (for backreference, even when restricted to one capturing group) under OVC, and that the problem with complement, also known as extended regular expression (ERE) matching, cannot be solved in time $n^{2-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$ under OVC, $n^{ω-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$ under the $k$-Clique Hypothesis (where $ω$ is the matrix multiplication exponent), and $n^{3-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$ under the Combinatorial $k$-Clique Hypothesis, respectively. In particular, the latter two results show that the $O(n^3 m)$-time ERE matching algorithm introduced by Hopcroft and Ullman in 1979 and recently improved by Bille, Gørtz and Jessen to run in $O(n^ωm)$ time using fast matrix multiplication was already optimal in a sense, and shed light on why the theoretical computer science community has struggled to improve the time complexity of ERE matching with respect to $n$ and $m$ for more than 45 years.
翻译:正则表达式匹配问题要求判断长度为$m$的正则表达式是否匹配长度为$n$的字符串。众所周知,该问题可通过Thompson算法在$O(nm)$时间内求解。此外,近期研究表明,包含一种实用扩展(环视)的正则表达式匹配可在相同时间复杂度内求解。本文研究了正则表达式的四种经典扩展:反向引用、平方、交集和补集。基于正交向量猜想(OVC)、$k$-OVC、$k$-团猜想及组合$k$-团猜想,我们针对包含这些扩展的正则表达式匹配问题,证明了一系列新颖的时间复杂度下界。主要结果亮点包括:在OVC假设下,对于任意常数$\varepsilon > 0$,包含这些扩展的匹配问题(其中反向引用即使限制为单个捕获组)均无法在$n^{2-\varepsilon} \mathrm{poly}(m)$时间内求解;而包含补集的扩展正则表达式(ERE)匹配问题,在OVC假设下无法在$n^{2-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$时间内求解,在$k$-团猜想下(其中$ω$为矩阵乘法指数)无法在$n^{ω-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$时间内求解,在组合$k$-团猜想下无法在$n^{3-\varepsilon}\mathrm{tower}(o(\sqrt{m}))$时间内求解。特别地,后两个结果揭示了:Hopcroft和Ullman于1979年提出的$O(n^3 m)$时间ERE匹配算法(近期Bille、Gørtz和Jessen通过快速矩阵乘法将其改进至$O(n^ω m)$时间)在某种意义上已达到最优,并阐明了理论计算机科学界为何在45年间始终未能突破ERE匹配关于$n$和$m$的时间复杂度。