Spectral clustering methods are known for their ability to represent clusters of diverse shapes, densities etc. However, results of such algorithms, when applied e.g. to text documents, are hard to explain to the user, especially due to embedding in the spectral space which has no obvious relation to document contents. Therefore there is an urgent need to elaborate methods for explaining the outcome of the clustering. This paper presents a contribution towards this goal. We present a proposal of explanation of results of combinatorial Laplacian based graph spectral clustering. It is based on showing (approximate) equivalence of combinatorial Laplacian embedding, $K$-embedding (proposed in this paper) and term vector space embedding. Hence a bridge is constructed between the textual contents and the clustering results. We provide theoretical background for this approach. We performed experimental study showing that $K$-embedding approximates well Laplacian embedding under favourable block matrix conditions and show that approximation is good enough under other conditions.
翻译:谱聚类方法以其能够表示不同形状、密度等的聚类而闻名。然而,此类算法在应用于文本文档等场景时,其结果难以向用户解释,尤其是由于嵌入到与文档内容无明显关联的谱空间中。因此,迫切需要开发解释聚类结果的方法。本文为此目标做出了贡献。我们提出了一种基于组合拉普拉斯矩阵的图谱谱聚类结果解释方案。该方案通过展示组合拉普拉斯嵌入、本文提出的$K$-嵌入以及词向量空间嵌入之间的(近似)等价性,建立了文本内容与聚类结果之间的桥梁。我们为该方法提供了理论基础。实验研究表明,在有利的块矩阵条件下,$K$-嵌入能够很好地近似拉普拉斯嵌入,并且在其他条件下近似效果也足够好。