We study the classic Text-to-Pattern Hamming Distances problem: given a pattern $P$ of length $m$ and a text $T$ of length $n$, both over a polynomial-size alphabet, compute the Hamming distance between $P$ and $T[i\, .\, . \, i+m-1]$ for every shift $i$, under the standard Word-RAM model with $\Theta(\log n)$-bit words. - We provide an $O(n\sqrt{m})$ time Las Vegas randomized algorithm for this problem, beating the decades-old $O(n \sqrt{m \log m})$ running time [Abrahamson, SICOMP 1987]. We also obtain a deterministic algorithm, with a slightly higher $O(n\sqrt{m}(\log m\log\log m)^{1/4})$ running time. Our randomized algorithm extends to the $k$-bounded setting, with running time $O\big(n+\frac{nk}{\sqrt{m}}\big)$, removing all the extra logarithmic factors from earlier algorithms [Gawrychowski and Uzna\'{n}ski, ICALP 2018; Chan, Golan, Kociumaka, Kopelowitz and Porat, STOC 2020]. - For the $(1+\epsilon)$-approximate version of Text-to-Pattern Hamming Distances, we give an $\tilde{O}(\epsilon^{-0.93}n)$ time Monte Carlo randomized algorithm, beating the previous $\tilde{O}(\epsilon^{-1}n)$ running time [Kopelowitz and Porat, FOCS 2015; Kopelowitz and Porat, SOSA 2018]. Our approximation algorithm exploits a connection with $3$SUM, and uses a combination of Fredman's trick, equality matrix product, and random sampling; in particular, we obtain new results on approximate counting versions of $3$SUM and Exact Triangle, which may be of independent interest. Our exact algorithms use a novel combination of hashing, bit-packed FFT, and recursion; in particular, we obtain a faster algorithm for computing the sumset of two integer sets, in the regime when the universe size is close to quadratic in the number of elements. We also prove a fine-grained equivalence between the exact Text-to-Pattern Hamming Distances problem and a range-restricted, counting version of $3$SUM.
翻译:我们研究经典的文本到模式汉明距离问题:给定一个长度为$m$的模式$P$和一个长度为$n$的文本$T$,两者均定义在多项式规模字母表上,在标准Word-RAM模型(字长为$\Theta(\log n)$位)下,计算每个偏移量$i$对应的$P$与$T[i\, .\, . \, i+m-1]$之间的汉明距离。- 我们提出了一个$O(n\sqrt{m})$时间的拉斯维加斯随机化算法,突破了已有数十年历史的$O(n \sqrt{m \log m})$运行时间[Abrahamson, SICOMP 1987]。我们还获得了一个确定性算法,其运行时间稍高,为$O(n\sqrt{m}(\log m\log\log m)^{1/4})$。我们的随机化算法可推广到$k$-有界设置,运行时间为$O\big(n+\frac{nk}{\sqrt{m}}\big)$,消除了早期算法中的所有额外对数因子[Gawrychowski and Uzna\'{n}ski, ICALP 2018; Chan, Golan, Kociumaka, Kopelowitz and Porat, STOC 2020]。- 对于文本到模式汉明距离的$(1+\epsilon)$-近似版本,我们给出了一个$\tilde{O}(\epsilon^{-0.93}n)$时间的蒙特卡洛随机化算法,超越了先前$\tilde{O}(\epsilon^{-1}n)$的运行时间[Kopelowitz and Porat, FOCS 2015; Kopelowitz and Porat, SOSA 2018]。我们的近似算法利用了与$3$SUM问题的关联,并综合运用了Fredman技巧、等式矩阵乘积和随机采样;特别地,我们在$3$SUM和精确三角形问题的近似计数版本上获得了新结果,这些结果可能具有独立的研究价值。我们的精确算法采用了哈希技术、位打包FFT和递归的新颖组合;特别地,当全集规模接近元素数量的二次方时,我们为计算两个整数集合的和集提供了更快的算法。我们还证明了精确文本到模式汉明距离问题与一个范围受限的、计数版本的$3$SUM问题之间的细粒度等价性。