The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of relationships between data points. Despite its efficacy, the current clustering method utilizing the graph-based model overlooks the uncertainty associated with random walk access between nodes and the embedded structural information in the data. To address this gap, we present a novel Clustering method for Maximizing Decoding Information within graph-based models, named CMDI. CMDI innovatively incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph vertex partitioning. Within CMDI, graph partitioning is reformulated as an abstract clustering problem, leveraging maximum decoding information to minimize uncertainty associated with random visits to vertices. Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R). Furthermore, CMDI showcases heightened efficiency, particularly when considering prior knowledge (PK). These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.
翻译:基于图模型的聚类方法因其在多个知识领域的广泛适用性而受到越来越多的关注。其能够与其他相关应用无缝集成的适应性,使得基于图模型的聚类分析能够稳健地提取数据集中的"自然关联"或"图结构",从而促进数据点之间关系的建模。尽管其效果显著,当前利用基于图模型的聚类方法忽略了与节点间随机游走访问相关的不确定性以及数据中嵌入的结构信息。为弥补这一不足,我们提出了一种新颖的、用于最大化基于图模型内部解码信息的聚类方法,命名为CMDI。CMDI创新性地将二维结构信息理论融入聚类过程,该过程包含两个阶段:图结构提取与图顶点划分。在CMDI中,图划分被重新表述为一个抽象的聚类问题,利用最大解码信息来最小化与随机访问顶点相关的不确定性。在三个真实世界数据集上的实证评估表明,CMDI优于经典的基线方法,展现出更优的解码信息比(DI-R)。此外,CMDI在考虑先验知识(PK)时表现出更高的效率。这些发现强调了CMDI在提升解码信息质量和计算效率方面的有效性,使其成为基于图的聚类分析中的一个有价值的工具。