We revisit the complexity of approximate pattern matching in an elastic-degenerate string. Such a string is a sequence of $n$ finite sets of strings of total length $N$, and compactly describes a collection of strings obtained by first choosing exactly one string in every set, and then concatenating them together. This is motivated by the need of storing a collection of highly similar DNA sequences. The basic algorithmic question on elastic-degenerate strings is pattern matching: given such an elastic-degenerate string and a standard pattern of length $m$, check if the pattern occurs in one of the strings in the described collection. Bernardini et al.~[SICOMP 2022] showed how to leverage fast matrix multiplication to obtain an $\tilde{\mathcal{O}}(nm^{\omega-1})+\mathcal{O}(N)$-time complexity for this problem, where $w$ is the matrix multiplication exponent. However, the best result so far for finding occurrences with $k$ mismatches, where $k$ is a constant, is the $\tilde{\mathcal{O}}(nm^{2}+N)$-time algorithm of Pissis et al.~[CPM 2025]. This brings the question whether increasing the dependency on $m$ from $m^{\omega-1}$ to quadratic is necessary when moving from $k=0$ to larger (but still constant) $k$. We design an $\tilde{\mathcal{O}}(nm^{1.5}+N)$-time algorithm for pattern matching with $k$ mismatches in an elastic-degenerate string, for any constant $k$. To obtain this time bound, we leverage the structural characterization of occurrences with $k$ mismatches of Charalampopoulos et al.~[FOCS 2020] together with fast Fourier transform. We need to work with multiple patterns at the same time, instead of a single pattern, which requires refining the original characterization. This might be of independent interest.
翻译:我们重新审视在弹性简并字符串中进行近似模式匹配的复杂度。此类字符串是由$n$个总长度为$N$的有限字符串集合构成的序列,它紧凑地描述了一个字符串集合:该集合中的每个字符串通过从每个集合中恰好选择一个字符串,然后将它们连接起来而获得。这源于存储高度相似DNA序列集合的需求。弹性简并字符串上的基本算法问题是模式匹配:给定这样一个弹性简并字符串和一个长度为$m$的标准模式,检查该模式是否出现在所描述集合的某个字符串中。Bernardini等人~[SICOMP 2022]展示了如何利用快速矩阵乘法,为该问题获得$\tilde{\mathcal{O}}(nm^{\omega-1})+\mathcal{O}(N)$的时间复杂度,其中$w$是矩阵乘法指数。然而,迄今为止,对于寻找带$k$个错配(其中$k$为常数)的出现,最好的结果是Pissis等人~[CPM 2025]的$\tilde{\mathcal{O}}(nm^{2}+N)$时间算法。这引出了一个问题:当从$k=0$转向更大(但仍为常数)的$k$时,将$m$的依赖关系从$m^{\omega-1}$增加到二次方是否必要?我们设计了一个$\tilde{\mathcal{O}}(nm^{1.5}+N)$时间算法,用于在弹性简并字符串中进行带$k$个错配(对于任意常数$k$)的模式匹配。为了获得这个时间界,我们利用了Charalampopoulos等人~[FOCS 2020]对带$k$个错配出现的结构刻画,并结合了快速傅里叶变换。我们需要同时处理多个模式,而非单个模式,这要求对原始刻画进行细化。这可能具有独立的研究意义。