Minimizing speculation overhead in a parallel recognizer for regular texts

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup compared to a serial algorithm. Existing data-parallel DFA-based recognizers suffer from the excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. The algorithm is proved to be correct and theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. The practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is much faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. The extra time cost needed to construct an RI-DFA compared to a DFA is moderate and is compatible with a practical use.

翻译：用于语言识别的推测式数据并行算法已在多种有限状态自动机（FA）上得到广泛实验，包括确定性自动机（DFA）和非确定性自动机（NFA），这些自动机通常由正则表达式（RE）导出。此类算法将输入字符串切分为数据块，通过相同的FA并行独立识别每个块，最后合并块结果并检查整体一致性。在块识别过程中，需要推测性地从任意状态启动FA，这会产生开销，从而降低相对于串行算法的加速比。现有的基于DFA的数据并行识别器因起始状态数量过多而受限，而基于NFA的识别器则受非确定性转移数量的影响。本文提出的数据并行算法基于一种称为简化接口DFA（RI-DFA）的新型FA类型，它能在不引入非确定性转移惩罚或DFA机器规模不切实际增大的前提下，最小化推测开销。该算法被证明是正确的且理论高效，因为它结合了NFA的状态缩减能力和确定性转移的速度，从而改进了现有基于DFA和基于NFA的实现。通过对复杂FA大型公开基准测试中起始状态数量的定量比较，证实了RI-DFA方法的实际适用性。在多核计算架构上，RI-DFA识别器在所有基准测试中均显著快于基于NFA的识别器；在部分测试中与基于DFA的识别器性能相当，在另一些测试中则表现更优。相较于DFA，构建RI-DFA所需的额外时间成本适中，符合实际应用需求。