Statistical analysis and node clustering in hypergraphs constitute an emerging topic suffering from a lack of standardization. In contrast to the case of graphs, the concept of nodes' community in hypergraphs is not unique and encompasses various distinct situations. In this work, we conducted a comparative analysis of the performance of modularity-based methods for clustering nodes in binary hypergraphs. To address this, we begin by presenting, within a unified framework, the various hypergraph modularity criteria proposed in the literature, emphasizing their differences and respective focuses. Subsequently, we provide an overview of the state-of-the-art codes available to maximize hypergraph modularities for detecting node communities in binary hypergraphs. Through exploration of various simulation settings with controlled ground truth clustering, we offer a comparison of these methods using different quality measures, including true clustering recovery, running time, (local) maximization of the objective, and the number of clusters detected. Our contribution marks the first attempt to clarify the advantages and drawbacks of these newly available methods. This effort lays the foundation for a better understanding of the primary objectives of modularity-based node clustering methods for binary hypergraphs.
翻译:超图(Hypergraph)的统计分析与节点聚类是一个新兴课题,但尚缺乏标准化规范。与普通图(Graph)不同,超图中节点社区(Community)的概念并非唯一,而是包含多种不同情形。本研究对基于模块度(Modularity)的二值超图(Binary Hypergraph)节点聚类方法进行了系统的比较分析。为此,我们首先在统一框架下梳理了文献中提出的各类超图模块度准则,着重阐明其差异与各自侧重点。随后,我们综述了当前用于最大化超图模块度以检测二值超图节点社区的先进代码实现。通过设计多种包含真实聚类标签(Ground Truth Clustering)可控的模拟实验场景,我们采用不同质量指标对这些方法进行比较,包括真实聚类恢复能力、运行时间、目标函数的(局部)最大化程度以及检测到的聚类数量。本研究首次系统澄清了这些新兴方法的优势与局限性,为深入理解二值超图基于模块度的节点聚类方法的核心目标奠定了基础。