Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i$ a subsequence of $S$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. We first present an $O(n^6)$ time algorithm to compute the longest cubic subsequences of all the $O(n^2)$ substrings of $S$, improving the trivial $O(n^7)$ bound. Then, an $O(n^6)$ time algorithm for computing the longest subsequence-repeated subsequence (LSRS) of $S$ is obtained. Finally we focus on two variants of this problem. We first consider the constrained version when $\Sigma$ is unbounded, each letter appears in $S$ at most $d$ times and all the letters in $\Sigma$ must appear in the solution. We show that the problem is NP-hard for $d=4$, via a reduction from a special version of SAT (which is obtained from 3-COLORING). We then show that when each letter appears in $S$ at most $d=3$ times, then the problem is solvable in $O(n^5)$ time.
翻译:受序列中重复模式计算的启发,提出一个称为最长子序列重复子序列(LSRS)的新基础问题。给定长度为$n$的序列$S$,字母重复子序列是$S$的一个子序列,形式为$x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$,其中$x_i$是$S$的子序列,$x_j\neq x_{j+1}$,且对所有$[k]$中的$i$和$[k-1]$中的$j$有$d_i\geq 2$。我们首先提出一个$O(n^6)$时间的算法,用于计算$S$的所有$O(n^2)$个子串中最长的立方子序列,改进了平凡的$O(n^7)$上界。然后,得到了一个$O(n^6)$时间的算法来计算$S$的最长子序列重复子序列(LSRS)。最后,我们关注该问题的两个变体。首先考虑当$\Sigma$无界、每个字母在$S$中出现至多$d$次且$\Sigma$中所有字母必须出现在解中时的约束版本。通过从SAT的一个特殊版本(由3-COLORING导出)归约,我们证明该问题在$d=4$时是NP难的。进一步证明,当每个字母在$S$中出现至多$d=3$次时,该问题可在$O(n^5)$时间内求解。