String matching is the problem of finding all the occurrences of a pattern in a text. We propose improved versions of the fast family of string matching algorithms based on hashing $q$-grams. The improvement consists of considering minimal values $q$ such that each $q$-grams of the pattern has a unique hash value. The new algorithms are fastest than algorithm of the HASH family for short patterns on large size alphabets.
翻译:字符串匹配是在文本中找出所有模式出现位置的问题。我们提出了基于哈希$q$元组的快速字符串匹配算法家族的改进版本。改进之处在于考虑最小化$q$值,使得模式的每个$q$元组具有唯一的哈希值。新算法在处理大规模字母表上的短模式时,比HASH家族算法更快。