We present a theory and an objective function for similarity-based hierarchical clustering of probabilistic partial orders and directed acyclic graphs (DAGs). Specifically, given elements $x \le y$ in the partial order, and their respective clusters $[x]$ and $[y]$, the theory yields an order relation $\le'$ on the clusters such that $[x]\le'[y]$. The theory provides a concise definition of order-preserving hierarchical clustering, and offers a classification theorem identifying the order-preserving trees (dendrograms). To determine the optimal order-preserving trees, we develop an objective function that frames the problem as a bi-objective optimisation, aiming to satisfy both the order relation and the similarity measure. We prove that the optimal trees under the objective are both order-preserving and exhibit high-quality hierarchical clustering. Since finding an optimal solution is NP-hard, we introduce a polynomial-time approximation algorithm and demonstrate that the method outperforms existing methods for order-preserving hierarchical clustering by a significant margin.
翻译:我们提出了一种基于相似性的概率偏序和有向无环图(DAG)层次聚类的理论及目标函数。具体而言,给定偏序中的元素 $x \le y$ 及其各自的聚类 $[x]$ 和 $[y]$,该理论推导出聚类上的一个序关系 $\le'$,使得 $[x]\le'[y]$。该理论为保持顺序的层次聚类提供了简洁的定义,并给出了一个分类定理,用于识别保持顺序的树(树状图)。为了确定最优的保持顺序的树,我们构建了一个目标函数,将问题表述为双目标优化,旨在同时满足序关系和相似性度量。我们证明了在该目标函数下的最优树既是保持顺序的,又展现出高质量的层次聚类。由于寻找最优解是NP难问题,我们提出了一种多项式时间近似算法,并证明该方法在保持顺序的层次聚类任务上显著优于现有方法。