In the present work, we tackle the regular language indexing problem by first studying the hierarchy of $p$-sortable languages: regular languages accepted by automata of width $p$. We show that the hierarchy is strict and does not collapse, and provide (exponential in $p$) upper and lower bounds relating the minimum widths of equivalent NFAs and DFAs. Our bounds indicate the importance of being able to index NFAs, as they enable indexing regular languages with much faster and smaller indexes. Our second contribution solves precisely this problem, optimally: we devise a polynomial-time algorithm that indexes any NFA with the optimal value $p$ for its width, without explicitly computing $p$ (NP-hard to find). In particular, this implies that we can index in polynomial time the well-studied case $p=1$ (Wheeler NFAs). More in general, in polynomial time we can build an index breaking the worst-case conditional lower bound of $\Omega(|P| m)$, whenever the input NFA's width is $p \in o(\sqrt{m})$.
翻译:本文中,我们首先通过研究$p$可排序语言(即宽度为$p$的自动机接受的正则语言)的层次结构,来解决正则语言索引问题。我们证明该层次结构是严格的且不会坍塌,并给出等价NFA和DFA最小宽度的上下界(关于$p$的指数级)。所获界表明,对NFA进行索引的能力至关重要,因为这样可以用更快速、更小的索引来索引正则语言。我们的第二个贡献精确解决了这一问题,并达到最优:设计了一个多项式时间算法,可为任意NFA索引其宽度的最优值$p$,而无需显式计算$p$(NP难问题)。特别地,这意味着我们可以在多项式时间内索引已被充分研究的特例$p=1$(Wheeler NFA)。更一般地,当输入NFA的宽度$p \in o(\sqrt{m})$时,我们可在多项式时间内构建索引,从而突破最坏情况下的条件下界$\Omega(|P| m)$。