String matching is the problem of finding all the occurrences of a pattern in a text. It has been intensively studied and the Boyer-Moore string matching algorithm is probably one of the most famous solution to this problem. This algorithm uses two precomputed shift tables called the good-suffix table and the bad-character table. The good-suffix table is tricky to compute in linear time. Text book solutions perform redundant operations. Here we present a fast implementation for this good-suffix table based on a tight analysis of the pattern. Experimental results show two versions of this new implementation are the fastest in almost all tested situations.
翻译:字符串匹配是寻找模式串在文本中所有出现位置的问题。该问题已被广泛研究,而Boyer-Moore字符串匹配算法可能是最著名的解决方案之一。该算法使用两个预计算的移位表:好后缀表与坏字符表。其中,好后缀表难以在线性时间内计算。教科书中的解决方案存在冗余操作。本文基于对模式串的严格分析,提出了一种快速实现该好后缀表的方法。实验结果表明,该新实现的两个版本在几乎所有测试场景中均为最快。