This paper develops an approximation to the (effective) $p$-resistance and applies it to multi-class clustering. Spectral methods based on the graph Laplacian and its generalization to the graph $p$-Laplacian have been a backbone of non-euclidean clustering techniques. The advantage of the $p$-Laplacian is that the parameter $p$ induces a controllable bias on cluster structure. The drawback of $p$-Laplacian eigenvector based methods is that the third and higher eigenvectors are difficult to compute. Thus, instead, we are motivated to use the $p$-resistance induced by the $p$-Laplacian for clustering. For $p$-resistance, small $p$ biases towards clusters with high internal connectivity while large $p$ biases towards clusters of small ``extent,'' that is a preference for smaller shortest-path distances between vertices in the cluster. However, the $p$-resistance is expensive to compute. We overcome this by developing an approximation to the $p$-resistance. We prove upper and lower bounds on this approximation and observe that it is exact when the graph is a tree. We also provide theoretical justification for the use of $p$-resistance for clustering. Finally, we provide experiments comparing our approximated $p$-resistance clustering to other $p$-Laplacian based methods.
翻译:本文开发了(有效)$p$-电阻的近似方法,并将其应用于多类聚类。基于图拉普拉斯算子及其推广至图$p$-拉普拉斯算子的谱方法,一直是非欧几里得聚类技术的核心支柱。$p$-拉普拉斯算子的优势在于参数$p$能够对聚类结构引入可控的偏置。然而,基于$p$-拉普拉斯算子特征向量的方法存在缺陷:第三及更高阶特征向量难以计算。因此,我们转而采用由$p$-拉普拉斯算子导出的$p$-电阻进行聚类。对于$p$-电阻,较小的$p$值倾向于内部连通性强的聚类,而较大的$p$值则偏向“范围”较小的聚类,即偏好聚类内顶点间最短路径距离较小的结构。但$p$-电阻的计算成本高昂。为解决这一问题,我们开发了$p$-电阻的近似方法。我们证明了该近似值的上下界,并发现当图为树结构时该近似是精确的。同时,我们为使用$p$-电阻进行聚类提供了理论依据。最后,通过实验将我们的近似$p$-电阻聚类方法与其他基于$p$-拉普拉斯算子的方法进行了比较。