Community detection in real-world networks is typically addressed through the use of graph clustering methods that partition the nodes of a network into disjoint subsets. While the definition of community may vary, it is generally accepted that elements of a community should be ``well-connected". We evaluated clusters generated by the Leiden algorithm and the Iterative K-core (IKC) clustering algorithm for their susceptibility to become disconnected by the deletion of a small number of edges. A striking observation is that for Leiden clustering of real-world networks, except for cases with large resolution parameter values, the majority of clusters do not meet the relatively mild condition we enforce for well-connected clusters. We also constructed a modular pipeline to enable well-connected output clusters that allows a user-specified criterion for a valid community considering cluster size and minimum edge cut size and describe the use of this pipeline on real world and synthetic networks. An interesting trend we observed is that the final clusterings on real-world networks had small node coverage, suggesting that not all nodes in a network belong in communities.
翻译:真实世界网络中的社区检测通常通过图聚类方法实现,将网络节点划分为互不相交的子集。尽管社区的定义可能有所不同,但普遍接受的观点是社区元素应具有"高连通性"。我们评估了Leiden算法与迭代K核(IKC)聚类算法所生成聚类在少量边被删除时发生离散化的敏感性。一个显著发现是:在对真实世界网络进行Leiden聚类时,除分辨率参数取值较大的情况外,绝大多数聚类均未达到我们对高连通聚类施加的较温和条件。我们还构建了模块化流水线以产生高连通输出聚类,该流水线允许用户根据聚类尺寸与最小边割集大小指定有效社区的判定标准,并描述了该流水线在真实世界与合成网络上的应用。观察到的有趣趋势是:真实世界网络的最终聚类节点覆盖率较小,表明网络并非所有节点都归属于社区。