The longest common subsequence (LCS) is a fundamental problem in string processing which has numerous algorithmic studies, extensions, and applications. A sequence $u_1, \ldots, u_f$ of $f$ strings s said to be an ($f$-)segmentation of a string $P$ if $P = u_1 \cdots u_f$. Li et al. [BIBM 2022] proposed a new variant of the LCS problem for given strings $T_1, T_2$ and an integer $f$, which we hereby call the segmental LCS problem (SegLCS), of finding (the length of) a longest string $P$ that has an $f$-segmentation which can be embedded into both $T_1$ and $T_2$. Li et al. [IJTCS-FAW 2024] gave a dynamic programming solution that solves SegLCS in $O(fn_1n_2)$ time with $O(fn_1 + n_2)$ space, where $n_1 = |T_1|$, $n_2 = |T_2|$, and $n_1 \le n_2$. Recently, Banerjee et al. [ESA 2024] presented an algorithm which, for a constant $f \geq 3$, solves SegLCS in $\tilde{O}((n_1n_2)^{1-(1/3)^{f-2}})$ time. In this paper, we deal with SegLCS as well as the problem of segmental subsequence pattern matching, SegE, that asks to determine whether a pattern $P$ of length $m$ has an $f$-segmentation that can be embedded into a text $T$ of length $n$. When $f = 1$, this is equivalent to substring matching, and when $f = |P|$, this is equivalent to subsequence matching. Our focus in this article is the case of general values of $f$, and our main contributions are threefold: (1) $O((mn)^{1-\epsilon})$-time conditional lower bound for SegE under the strong exponential-time hypothesis (SETH), for any constant $\epsilon > 0$. (2) $O(mn)$-time algorithm for SegE. (3) $O(fn_2(n_1 - \ell+1))$-time algorithm for SegLCS where $\ell$ is the solution length.
翻译:最长公共子序列(LCS)是字符串处理中的一个基本问题,已有大量算法研究、扩展与应用。若字符串序列 $u_1, \ldots, u_f$ 满足 $P = u_1 \cdots u_f$,则称其为字符串 $P$ 的一个($f$ 段)分割。Li 等人 [BIBM 2022] 针对给定字符串 $T_1, T_2$ 与整数 $f$ 提出了 LCS 问题的一个新变种,本文称之为分段最长公共子序列问题(SegLCS),其目标是寻找(长度最大的)字符串 $P$,使得 $P$ 存在一个 $f$ 段分割,且该分割可同时嵌入 $T_1$ 与 $T_2$。Li 等人 [IJTCS-FAW 2024] 给出了一种动态规划解法,可在 $O(fn_1n_2)$ 时间与 $O(fn_1 + n_2)$ 空间内求解 SegLCS,其中 $n_1 = |T_1|$,$n_2 = |T_2|$,且 $n_1 \le n_2$。最近,Banerjee 等人 [ESA 2024] 提出了一种算法,对于常数 $f \geq 3$,可在 $\tilde{O}((n_1n_2)^{1-(1/3)^{f-2}})$ 时间内求解 SegLCS。本文同时研究 SegLCS 问题以及分段子序列模式匹配问题(SegE),后者要求判断长度为 $m$ 的模式串 $P$ 是否存在一个 $f$ 段分割可嵌入长度为 $n$ 的文本串 $T$。当 $f = 1$ 时,该问题等价于子串匹配;当 $f = |P|$ 时,则等价于子序列匹配。本文重点关注 $f$ 取一般值的情况,主要贡献包括以下三点:(1)在强指数时间假设(SETH)下,对任意常数 $\epsilon > 0$,证明了 SegE 问题的 $O((mn)^{1-\epsilon})$ 时间条件性下界。(2)提出了 SegE 问题的 $O(mn)$ 时间算法。(3)提出了 SegLCS 问题的 $O(fn_2(n_1 - \ell+1))$ 时间算法,其中 $\ell$ 为解的长度。