In this paper, we study for the first time the Diverse Longest Common Subsequences (LCSs) problem under Hamming distance. Given a set of a constant number of input strings, the problem asks to decide if there exists some subset $\mathcal X$ of $K$ longest common subsequences whose diversity is no less than a specified threshold $\Delta$, where we consider two types of diversities of a set $\mathcal X$ of strings of equal length: the Sum diversity and the Min diversity defined as the sum and the minimum of the pairwise Hamming distance between any two strings in $\mathcal X$, respectively. We analyze the computational complexity of the respective problems with Sum- and Min-diversity measures, called the Max-Sum and Max-Min Diverse LCSs, respectively, considering both approximation algorithms and parameterized complexity. Our results are summarized as follows. When $K$ is bounded, both problems are polynomial time solvable. In contrast, when $K$ is unbounded, both problems become NP-hard, while Max-Sum Diverse LCSs problem admits a PTAS. Furthermore, we analyze the parameterized complexity of both problems with combinations of parameters $K$ and $r$, where $r$ is the length of the candidate strings to be selected. Importantly, all positive results above are proven in a more general setting, where an input is an edge-labeled directed acyclic graph (DAG) that succinctly represents a set of strings of the same length. Negative results are proven in the setting where an input is explicitly given as a set of strings. The latter results are equipped with an encoding such a set as the longest common subsequences of a specific input string set.
翻译:本文首次研究了汉明距离下的多样化最长公共子序列问题。给定一组恒定数量的输入字符串,该问题要求判断是否存在一个包含K条最长公共子序列的子集$\mathcal X$,其多样性不低于指定阈值$\Delta$。我们考虑长度相等的字符串集合$\mathcal X$的两种多样性度量:总和多样性与最小多样性,分别定义为$\mathcal X$中任意两两字符串之间汉明距离的总和与最小值。我们分析了分别采用总和多样性和最小多样性度量的问题(即最大总和多样化LCS和最大最小多样化LCS)的计算复杂性,涵盖近似算法和参数化复杂度两方面。研究结果总结如下:当K有界时,两个问题均可在多项式时间内求解;相反,当K无界时,两个问题均成为NP难问题,而最大总和多样化LCS问题存在多项式时间近似方案。此外,我们进一步分析了这两个问题关于参数K和r(待选候选字符串的长度)组合的参数化复杂度。重要的是,上述所有正面结果均是在更一般的设定下证明的——该设定中输入为简洁表示一组等长字符串的边标记有向无环图。负面结果则是在输入显式表示为字符串集合的设定下证明的,此类结果附带了将此字符串集编码为特定输入字符串集的最长公共子序列的方法。