Revisiting the Complexity of and Algorithms for the Graph Traversal Edit Distance and Its Variants

The graph traversal edit distance (GTED) is an elegant distance measure defined as the minimum edit distance between strings reconstructed from Eulerian trails in two edge-labeled graphs. GTED can be used to infer evolutionary relationships between species by comparing de Bruijn graphs directly without the computationally costly and error-prone process of genome assembly. Ebrahimpour Boroojeny et al.~(2018) suggest two ILP formulations for GTED and claim that GTED is polynomially solvable because the linear programming relaxation of one of the ILP always yields optimal integer solutions. The result that GTED is polynomially solvable is contradictory to the complexity results of existing string-to-graph matching problems. We resolve this conflict in complexity results by proving that GTED is NP-complete and showing that the ILPs proposed by Ebrahimpour Boroojeny et al. do not solve GTED but instead solve for a lower bound of GTED and are not solvable in polynomial time. In addition, we provide the first two, correct ILP formulations of GTED and evaluate their empirical efficiency. These results provide solid algorithmic foundations for comparing genome graphs and point to the direction of approximation heuristics. The source code to reproduce experimental results is available at https://github.com/Kingsford-Group/gtednewilp/.

翻译：图遍历编辑距离（GTED）是一种优雅的距离度量，定义为从两个边标记图的欧拉路径中重建的字符串之间的最小编辑距离。GTED可用于直接比较德布鲁因图，从而推断物种间的进化关系，而无需经过计算成本高昂且易出错的基因组组装过程。Ebrahimpour Boroojeny等人（2018年）提出了两种GTED的整数线性规划（ILP）公式，并声称GTED可在多项式时间内求解，因为其中一个ILP的线性规划松弛总能产生最优整数解。然而，GTED可多项式求解的结果与现有字符串-图匹配问题的复杂性结论相矛盾。我们通过证明GTED是NP完全问题，并指出Ebrahimpour Boroojeny等人提出的ILP公式并非求解GTED本身，而是求解GTED的下界，且无法在多项式时间内求解，从而解决了这一复杂性结果中的矛盾。此外，我们首次提出了两个正确的GTED的ILP公式，并评估了其经验效率。这些结果为比较基因组图提供了坚实的算法基础，并指明了近似启发式方法的研究方向。重现实验结果的源代码可在https://github.com/Kingsford-Group/gtednewilp/ 获取。

相关内容

ILP

关注 132

归纳逻辑程序设计（ILP）是机器学习的一个分支，它依赖于逻辑程序作为一种统一的表示语言来表达例子、背景知识和假设。基于一阶逻辑的ILP具有很强的表示形式，为多关系学习和数据挖掘提供了一种很好的方法。International Conference on Inductive Logic Programming系列始于1991年，是学习结构化或半结构化关系数据的首要国际论坛。最初专注于逻辑程序的归纳，多年来，它大大扩展了研究范围，并欢迎在逻辑学习、多关系数据挖掘、统计关系学习、图形和树挖掘等各个方面作出贡献，学习其他（非命题）基于逻辑的知识表示框架，探索统计学习和其他概率方法的交叉点。官网链接：https://ilp2019.org/

不可错过！《机器学习100讲》课程，UBC Mark Schmidt讲授

专知会员服务

76+阅读 · 2022年6月28日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日