The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of relationships between data points. Despite its efficacy, the current clustering method utilizing the graph-based model overlooks the uncertainty associated with random walk access between nodes and the embedded structural information in the data. To address this gap, we present a novel Clustering method for Maximizing Decoding Information within graph-based models, named CMDI. CMDI innovatively incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph vertex partitioning. Within CMDI, graph partitioning is reformulated as an abstract clustering problem, leveraging maximum decoding information to minimize uncertainty associated with random visits to vertices. Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R). Furthermore, CMDI showcases heightened efficiency, particularly when considering prior knowledge (PK). These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.
翻译:基于图模型的聚类方法因其在多个知识领域的广泛适用性而受到越来越多的关注。图模型聚类分析能够无缝集成其他相关应用,从而稳健地提取数据集中的“自然关联”或“图结构”,便于对数据点之间的关系进行建模。尽管当前基于图模型的聚类方法效果显著,但其忽略了节点间随机游走访问的不确定性以及数据中嵌入的结构信息。为解决这一不足,我们提出了一种新颖的基于图模型的最大化解码信息聚类方法,命名为CMDI。CMDI创新性地将二维结构信息理论融入聚类过程,包含两个阶段:图结构提取与图顶点划分。在CMDI中,图划分被重构为一个抽象聚类问题,通过最大化解码信息来最小化与顶点随机访问相关的不确定性。在三个真实世界数据集上的实验评估表明,CMDI优于经典基线方法,展现出更优的解码信息比率(DI-R)。此外,CMDI在考虑先验知识(PK)时表现出更高的效率。这些发现凸显了CMDI在提升解码信息质量和计算效率方面的有效性,使其成为基于图的聚类分析中的一种重要工具。