Faster Weighted and Unweighted Tree Edit Distance and APSP Equivalence

The tree edit distance (TED) between two rooted ordered trees with $n$ nodes labeled from an alphabet $\Sigma$ is the minimum cost of transforming one tree into the other by a sequence of valid operations consisting of insertions, deletions and relabeling of nodes. The tree edit distance is a well-known generalization of string edit distance and has been studied since the 1970s. Years of steady improvements have led to an $O(n^3)$ algorithm [DMRW 2010]. Fine-grained complexity casts light onto the hardness of TED showing that a truly subcubic time algorithm for TED implies a truly subcubic time algorithm for All-Pairs Shortest Paths (APSP) [BGMW 2020]. Therefore, under the popular APSP hypothesis, a truly subcubic time algorithm for TED cannot exist. However, unlike many problems in fine-grained complexity for which conditional hardness based on APSP also comes with equivalence to APSP, whether TED can be reduced to APSP has remained unknown. In this paper, we resolve this. Not only we show that TED is fine-grained equivalent to APSP, our reduction is tight enough, so that combined with the fastest APSP algorithm to-date [Williams 2018] it gives the first ever subcubic time algorithm for TED running in $n^3/2^{\Omega(\sqrt{\log{n}})}$ time. We also consider the unweighted tree edit distance problem in which the cost of each edit is one. For unweighted TED, a truly subcubic algorithm is known due to Mao [Mao 2022], later improved slightly by D\"{u}rr [D\"{u}rr 2023] to run in $O(n^{2.9148})$. Their algorithm uses bounded monotone min-plus product as a crucial subroutine, and the best running time for this product is $\tilde{O}(n^{\frac{3+\omega}{2}})\leq O(n^{2.6857})$ (where $\omega$ is the exponent of fast matrix multiplication). In this work, we close this gap and give an algorithm for unweighted TED that runs in $\tilde{O}(n^{\frac{3+\omega}{2}})$ time.

翻译：树编辑距离（TED）是指两个带有 $n$ 个节点、节点标签来自字母表 $\Sigma$ 的有根有序树之间，通过一系列节点插入、删除和重标记的有效操作将一棵树转换为另一棵树的最小代价。树编辑距离是字符串编辑距离的著名推广，自 1970 年代以来一直被研究。多年的稳步改进已催生出 $O(n^3)$ 的算法 [DMRW 2010]。细粒度复杂性理论揭示了 TED 的困难性，表明若存在真正次立方时间复杂度的 TED 算法，则意味着存在真正次立方时间复杂度的全对最短路径（APSP）算法 [BGMW 2020]。因此，在流行的 APSP 假设下，真正次立方时间复杂度的 TED 算法不可能存在。然而，与细粒度复杂性中许多基于 APSP 的条件困难性同时也与 APSP 等价的问题不同，TED 是否能归约到 APSP 一直未知。在本文中，我们解决了这个问题。我们不仅证明了 TED 与 APSP 是细粒度等价的，而且我们的归约足够紧密，以至于结合迄今为止最快的 APSP 算法 [Williams 2018]，它给出了首个次立方时间复杂度的 TED 算法，运行时间为 $n^3/2^{\Omega(\sqrt{\log{n}})}$。我们还考虑了非加权树编辑距离问题，其中每次编辑的代价为 1。对于非加权 TED，由于 Mao 的工作 [Mao 2022]，已知存在真正次立方算法，后经 D\"{u}rr 稍作改进 [D\"{u}rr 2023]，运行时间达到 $O(n^{2.9148})$。他们的算法使用有界单调最小加积作为关键子程序，而该乘积的最佳运行时间为 $\tilde{O}(n^{\frac{3+\omega}{2}})\leq O(n^{2.6857})$（其中 $\omega$ 是快速矩阵乘法的指数）。在本工作中，我们弥合了这一差距，给出了一个运行时间为 $\tilde{O}(n^{\frac{3+\omega}{2}})$ 的非加权 TED 算法。

相关内容

TED

关注 19

TED（指 Technology、Entertainment、Design 在英语中的缩写，即技术、娱乐、设计）是美国的一家私有非营利机构，该机构以它组织的 TED 大会著称。每年3月，TED大会在美国召集众多科学、设计、文学、音乐等领域的杰出人物，分享他们关於技术、社会、人的思考和探索。TED演讲的特点是毫无繁杂冗长的专业讲座，观点响亮，开门见山，种类繁多，看法新颖。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日