In the present work, we lay out a new theory showing that all automata can always be co-lexicographically partially ordered, and an intrinsic measure of their complexity can be defined and effectively determined, namely, the minimum width $p$ of one of their admissible co-lex partial orders - dubbed here the automaton's co-lex width. We first show that this new measure captures at once the complexity of several seemingly-unrelated hard problems on automata. Any NFA of co-lex width $p$: (i) has an equivalent powerset DFA whose size is exponential in $p$ rather than (as a classic analysis shows) in the NFA's size; (ii) can be encoded using just $\Theta(\log p)$ bits per transition; (iii) admits a linear-space data structure solving regular expression matching queries in time proportional to $p^2$ per matched character. Some consequences of this new parametrization of automata are that PSPACE-hard problems such as NFA equivalence are FPT in $p$, and quadratic lower bounds for the regular expression matching problem do not hold for sufficiently small $p$. We prove that a canonical minimum-width DFA accepting a language $\mathcal L$ - dubbed the Hasse automaton $\mathcal H$ of $\mathcal L$ - can be exhibited. Finally, we explore the relationship between two conflicting objectives: minimizing the width and minimizing the number of states of a DFA. In this context, we provide an analogous of the Myhill-Nerode Theorem for co-lexicographically ordered regular languages.
翻译:在本文中,我们提出了一套新理论,证明所有自动机均可进行共字典序偏序排序,并可定义并有效确定其内在复杂度度量,即自动机容许的共字典序偏序的最小宽度$p$——本文称之为自动机的共字典序宽度。我们首先证明,这一新度量能够同时刻画自动机上若干看似无关的难题的复杂度。任意共字典序宽度为$p$的非确定有限自动机(NFA):(i)存在一个等价的幂集确定有限自动机(DFA),其规模是$p$的指数函数,而非(如经典分析所示)NFA规模的指数函数;(ii)每条转移仅需使用$\Theta(\log p)$比特进行编码;(iii)支持一种线性空间数据结构,可在每匹配一个字符时以正比于$p^2$的时间求解正则表达式匹配查询。这一自动机参数化的若干推论包括:NFA等价性等PSPACE困难问题在$p$上具有固定参数可解性(FPT),且对于足够小的$p$,正则表达式匹配问题的二次下界不成立。我们证明,存在接受语言$\mathcal L$的规范最小宽度DFA——称为$\mathcal L$的哈斯自动机$\mathcal H$。最后,我们探讨了DFA宽度最小化与状态数最小化这两个冲突目标之间的关系。在此背景下,我们给出了共字典序有序正则语言的Myhill-Nerode定理的类似物。