In the field of topological data analysis, persistence modules are used to express geometrical features of data sets. The matching distance $d_\mathcal{M}$ measures the difference between $2$-parameter persistence modules by taking the maximum bottleneck distance between $1$-parameter slices of the modules. The previous best algorithm to compute $d_\mathcal{M}$ exactly runs in $O(n^{8+\omega})$ time using $O(n^4)$ space, where $n$ is the number of generators and relations of the modules and $\omega$ is the matrix multiplication constant. We improve significantly on this by describing an algorithm with expected running time $O(n^5 \log^3 n)$ and using $O(n^2)$ space. We first solve the decision problem $d_\mathcal{M}\leq \lambda$ for a constant $\lambda$ in $O(n^5\log n)$ time by traversing a line arrangement in the dual plane, where each point represents a slice. Then we lift the line arrangement to a plane arrangement in $\mathbb{R}^3$ whose vertices represent possible values for $d_\mathcal{M}$, and use a randomized incremental method to search through the vertices and find $d_\mathcal{M}$. The expected running time of this algorithm is $O((n^4+T(n))\log^2 n)$, where $T(n)$ is an upper bound for the complexity of deciding if $d_\mathcal{M}\leq \lambda$. Moreover, we show how to compute the matching distance using only linear space, to the price of a much worse time complexity.
翻译:在拓扑数据分析领域,持久性模块用于表达数据集的几何特征。匹配距离$d_\mathcal{M}$通过取模块1-参数切片之间的最大瓶颈距离来衡量2-参数持久性模块的差异。此前精确计算$d_\mathcal{M}$的最佳算法运行时间为$O(n^{8+\omega})$,空间复杂度为$O(n^4)$,其中$n$为模块的生成元与关系个数,$\omega$为矩阵乘法常数。我们通过提出一种期望运行时间$O(n^5 \log^3 n)$、空间复杂度$O(n^2)$的算法,显著改进了这一结果。首先,我们通过遍历对偶平面中的线排列(每个点代表一个切片),以$O(n^5\log n)$时间解决常数$\lambda$下的决策问题$d_\mathcal{M}\leq \lambda$。随后,将该线排列提升为$\mathbb{R}^3$中的平面排列,其顶点代表$d_\mathcal{M}$的可能取值,并采用随机增量方法搜索顶点以求得$d_\mathcal{M}$。该算法的期望运行时间为$O((n^4+T(n))\log^2 n)$,其中$T(n)$是判定$d_\mathcal{M}\leq \lambda$复杂性的上界。此外,我们展示了如何仅用线性空间计算匹配距离,但以显著更差的时间复杂度为代价。