The graph traversal edit distance (GTED) is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al.~(2018) suggest two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILP always yields optimal integer solutions. The result that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of approximation heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/.
翻译:图遍历编辑距离(GTED)是一种优雅的距离度量,定义为从两个边标记图的欧拉路径中重建的字符串之间的最小编辑距离。GTED可用于直接比较德布鲁因图,从而推断物种间的进化关系,而无需经过计算成本高昂且易出错的基因组组装过程。Ebrahimpour Boroojeny等人(2018年)提出了两种GTED的整数线性规划(ILP)公式,并声称GTED可在多项式时间内求解,因为其中一个ILP的线性规划松弛总能产生最优整数解。然而,GTED可多项式求解的结果与现有字符串-图匹配问题的复杂性结论相矛盾。我们通过证明GTED是NP完全问题,并指出Ebrahimpour Boroojeny等人提出的ILP公式并非求解GTED本身,而是求解GTED的下界,且无法在多项式时间内求解,从而解决了这一复杂性结果中的矛盾。此外,我们首次提出了两个正确的GTED的ILP公式,并评估了其经验效率。这些结果为比较基因组图提供了坚实的算法基础,并指明了近似启发式方法的研究方向。重现实验结果的源代码可在https://github.com/Kingsford-Group/gtednewilp/ 获取。