Regular expressions with backreferences (regex, for short), as supported by most modern libraries for regular expression matching, have an NP-complete matching problem. We define a complexity parameter of regex, called active variable degree, such that regex with this parameter bounded by a constant can be matched in polynomial-time. Moreover, we formulate a novel type of determinism for regex (on an automaton-theoretic level), which yields the class of memory-deterministic regex that can be matched in time O(|w|p(|r|)) for a polynomial p (where r is the regex and w the word). Natural extensions of these concepts lead to properties of regex that are intractable to check.
翻译:支持反向引用的正则表达式(简称regex)是现代多数正则表达式匹配库所支持的功能,其匹配问题属于NP完全问题。本文定义了正则表达式的一个复杂度参数——活跃变量度,当该参数被常数有界时,可在多项式时间内完成匹配。此外,我们提出了一种新型的正则表达式确定性概念(基于自动机理论层面),由此得到记忆确定性正则表达式类,可在多项式p(|r|)的时间复杂度O(|w|p(|r|))内完成匹配(其中r为正则表达式,w为待匹配词串)。这些概念的自然延伸将导致正则表达式属性变得难以判定。