Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i$ a subsequence of $S$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. We first present an $O(n^6)$ time algorithm to compute the longest cubic subsequences of all the $O(n^2)$ substrings of $S$, improving the trivial $O(n^7)$ bound. Then, an $O(n^6)$ time algorithm for computing the longest subsequence-repeated subsequence (LSRS) of $S$ is obtained. Finally we focus on two variants of this problem. We first consider the constrained version when $\Sigma$ is unbounded, each letter appears in $S$ at most $d$ times and all the letters in $\Sigma$ must appear in the solution. We show that the problem is NP-hard for $d=4$, via a reduction from a special version of SAT (which is obtained from 3-COLORING). We then show that when each letter appears in $S$ at most $d=3$ times, then the problem is solvable in $O(n^5)$ time.
翻译:受计算序列中重复模式的启发,提出一个称为最长子序列-重复子序列(LSRS)的新基础问题。给定一个长度为 $n$ 的序列 $S$,字母重复子序列是 $S$ 的一个子序列,形式为 $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$,其中 $x_i$ 是 $S$ 的子序列,$x_j\neq x_{j+1}$,且对 $[k]$ 中的所有 $i$ 和 $[k-1]$ 中的所有 $j$ 有 $d_i\geq 2$。我们首先提出一个 $O(n^6)$ 时间算法来计算 $S$ 的所有 $O(n^2)$ 个子串的最长立方子序列,改进了平凡的 $O(n^7)$ 上界。进而,得到一个 $O(n^6)$ 时间算法来计算 $S$ 的最长子序列-重复子序列(LSRS)。最后,我们关注该问题的两个变体。首先考虑当 $\Sigma$ 无界、每个字母在 $S$ 中出现至多 $d$ 次且 $\Sigma$ 中所有字母必须出现在解中时的约束版本。我们通过从 SAT 的一个特殊版本(由 3-COLORING 导出)进行归约,证明该问题在 $d=4$ 时是 NP-困难的。接着证明,当每个字母在 $S$ 中出现至多 $d=3$ 次时,该问题可在 $O(n^5)$ 时间内求解。