A pangenome captures the genetic diversity across multiple individuals simultaneously, providing a more comprehensive reference for genome analysis than a single linear genome, which may introduce allele bias. A widely adopted pangenome representation is a node-labeled directed graph, wherein the paths correspond to plausible genomic sequences within a species. Consequently, evaluating sequence-to-pangenome graph similarity constitutes a fundamental task in pangenome construction and analysis. This study explores the Longest Common Subsequence (LCS) problem and three of its variants involving a sequence and a pangenome graph. We present four polynomial-time reductions that transform these LCS-related problems into the longest path problem in a directed acyclic graph (DAG). These reductions demonstrate that all four problems can be solved in polynomial time, establishing their membership in the complexity class P.
翻译:泛基因组能够同时捕获多个个体的遗传多样性,为基因组分析提供比单一线性基因组更全面的参考,从而避免等位基因偏差。目前广泛采用的泛基因组表示形式是节点标记有向图,其中路径对应于物种内可能的基因组序列。因此,评估序列到泛基因组图的相似性成为泛基因组构建与分析中的一项基础任务。本研究探讨了最长公共子序列问题及其三个涉及序列与泛基因组图的变体。我们提出了四种多项式时间归约方法,将这些与LCS相关的问题转化为有向无环图中的最长路径问题。这些归约证明所有四个问题均可在多项式时间内求解,从而确立了它们属于复杂度类P。