Rank-based linkage is a new tool for summarizing a collection $S$ of objects according to their relationships. These objects are not mapped to vectors, and ``similarity'' between objects need be neither numerical nor symmetrical. All an object needs to do is rank nearby objects by similarity to itself, using a Comparator which is transitive, but need not be consistent with any metric on the whole set. Call this a ranking system on $S$. Rank-based linkage is applied to the $K$-nearest neighbor digraph derived from a ranking system. Computations occur on a 2-dimensional abstract oriented simplicial complex whose faces are among the points, edges, and triangles of the line graph of the undirected $K$-nearest neighbor graph on $S$. In $|S| K^2$ steps it builds an edge-weighted linkage graph $(S, \mathcal{L}, \sigma)$ where $\sigma(\{x, y\})$ is called the in-sway between objects $x$ and $y$. Take $\mathcal{L}_t$ to be the links whose in-sway is at least $t$, and partition $S$ into components of the graph $(S, \mathcal{L}_t)$, for varying $t$. Rank-based linkage is a functor from a category of out-ordered digraphs to a category of partitioned sets, with the practical consequence that augmenting the set of objects in a rank-respectful way gives a fresh clustering which does not ``rip apart`` the previous one. The same holds for single linkage clustering in the metric space context, but not for typical optimization-based methods. Open combinatorial problems are presented in the last section.
翻译:基于排名的链接是一种根据对象间关系总结集合 $S$ 的新工具。这些对象无需映射为向量,且对象间的“相似性”既不必是数值化的,也不必是对称的。每个对象只需通过一个具有传递性但无需与整体集合的任意度量一致的比较器(Comparator),根据相似性对邻近对象进行排序。将此称为 $S$ 上的排名系统。基于排名的链接应用于由排名系统导出的 $K$ 近邻有向图。计算过程在二维抽象有向单纯复合体上进行,其面由 $S$ 的无向 $K$ 近邻图的线图中的点、边和三角形构成。算法在 $|S| K^2$ 步内构建一个边加权链接图 $(S, \mathcal{L}, \sigma)$,其中 $\sigma(\{x, y\})$ 称为对象 $x$ 和 $y$ 之间的内摆度(in-sway)。取 $\mathcal{L}_t$ 为内摆度至少为 $t$ 的链接,并将 $S$ 划分为图 $(S, \mathcal{L}_t)$ 的分量(其中 $t$ 可变)。基于排名的链接是从外序有向图范畴到划分集范畴的一个函子,其实际意义在于:以尊重排名的方式扩充对象集可生成新聚类,且不会“撕裂”原有聚类。这在度量空间语境下的单链接聚类中同样成立,但对于典型的基于优化的方法则不适用。最后一节提出了若干开放组合问题。