Motivated by computing duplication patterns in sequences, a new fundamental problem called the longest subsequence-repeated subsequence (LSRS) is proposed. Given a sequence $S$ of length $n$, a letter-repeated subsequence is a subsequence of $S$ in the form of $x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$ with $x_i$ a subsequence of $S$, $x_j\neq x_{j+1}$ and $d_i\geq 2$ for all $i$ in $[k]$ and $j$ in $[k-1]$. We first present an $O(n^6)$ time algorithm to compute the longest cubic subsequences of all the $O(n^2)$ substrings of $S$, improving the trivial $O(n^7)$ bound. Then, an $O(n^6)$ time algorithm for computing the longest subsequence-repeated subsequence (LSRS) of $S$ is obtained. Finally we focus on two variants of this problem. We first consider the constrained version when $\Sigma$ is unbounded, each letter appears in $S$ at most $d$ times and all the letters in $\Sigma$ must appear in the solution. We show that the problem is NP-hard for $d=4$, via a reduction from a special version of SAT (which is obtained from 3-COLORING). We then show that when each letter appears in $S$ at most $d=3$ times, then the problem is solvable in $O(n^5)$ time.
翻译:受序列中重复模式计算需求的驱动,提出一个新的基础性问题——最长子序列重复子序列(LSRS)。给定长度为$n$的序列$S$,字母重复子序列是$S$中形如$x_1^{d_1}x_2^{d_2}\cdots x_k^{d_k}$的子序列,其中$x_i$为$S$的子序列,$x_j\neq x_{j+1}$,且对所有$i\in[k]$和$j\in[k-1]$均有$d_i\geq 2$。我们首先提出一个$O(n^6)$时间算法,用于计算$S$所有$O(n^2)$个子串的最长三次子序列,将原$O(n^7)$平凡界进行了改进。随后,得到计算$S$的最长子序列重复子序列(LSRS)的$O(n^6)$时间算法。最后,我们关注该问题的两个变体。首先考虑有界情形:当$\Sigma$无界、每个字母在$S$中最多出现$d$次,且$\Sigma$中所有字母必须出现在解中时,通过从SAT的特殊版本(由3-染色问题转化而来)进行归约,证明该问题在$d=4$时是NP难问题。进而证明,当每个字母在$S$中最多出现$d=3$次时,该问题可在$O(n^5)$时间内求解。