We introduce a decomposition method for the distributed calculation of exact Euclidean Minimum Spanning Trees in high dimensions (where sub-quadratic algorithms are not effective), or more generalized geometric-minimum spanning trees of complete graphs, where for each vertex $v\in V$ in the graph $G=(V,E)$ is represented by a vector in $\vec{v}\in \mathbb{R}^n$, and each for any edge, the the weight of the edge in the graph is given by a symmetric binary `distance' function between the representative vectors $w(\{x,y\}) = d(\vec{x},\vec{y})$. This is motivated by the task of clustering high dimensional embeddings produced by neural networks, where low-dimensional algorithms are ineffective; such geometric-minimum spanning trees find applications as a subroutine in the construction of single linkage dendrograms, as the two structures can be converted between each other efficiently.
翻译:我们提出了一种分解方法,用于高维空间中精确欧几里得最小生成树的分布式计算(此时亚二次算法效率低下),或更广义的完全图几何最小生成树。在该图$G=(V,E)$中,每个顶点$v\in V$由向量$\vec{v}\in \mathbb{R}^n$表示,且图中任意边的权重由代表向量间的对称二元“距离”函数$w(\{x,y\}) = d(\vec{x},\vec{y})$给出。该方法源于对神经网络生成的高维嵌入进行聚类的需求,此时低维算法效果不佳;此类几何最小生成树可作为构建单连接树状图的子程序,因为这两种结构可以高效地相互转换。