Minimizing speculation overhead in a parallel recognizer for regular texts

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of FA (DFA and NFA) automata, often derived from regular expressions. Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup over a serial algorithm. Existing data-parallel DFA-based recognizers suffer from the excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. The algorithm is proved to be correct and theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. The practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is much faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. The extra time cost to construct RI-DFA vs DFA is moderate and is compatible with a practical use.

翻译：用于语言识别的推测性数据并行算法已在多种类型的有限自动机（DFA与NFA）上得到广泛实验，这些自动机通常由正则表达式推导而来。此类算法将输入字符串分割为数据块，通过相同的有限自动机并行独立识别每个数据块，最后合并分块结果并检查整体一致性。在分块识别过程中，有限自动机需要以任意状态进行推测性启动，由此产生的开销会降低相对于串行算法的加速比。现有基于DFA的数据并行识别器受困于过多的起始状态数量，而基于NFA的识别器则受限于非确定性转移的数量。本文提出的数据并行算法基于新型有限自动机——精简接口DFA（RI-DFA），该模型能在不引入非确定性转移或DFA规模爆炸的前提下最小化推测开销。该算法被证明具有正确性与理论高效性，因其融合了NFA的状态精简优势与确定性转移的速度特性，从而在现有基于DFA和NFA的实现方案上均取得改进。通过对复杂有限自动机大型公共基准测试中起始状态数量的定量比较，验证了RI-DFA方法的实际适用性。在多核计算架构上，RI-DFA识别器在所有基准测试中均大幅优于基于NFA的识别器；与基于DFA的识别器相比，在部分基准测试中表现相当，在另一些测试中则显著更优。相较于DFA，构建RI-DFA的额外时间成本适中，符合实际应用需求。