Hypergraphs naturally arise when studying group relations and have been widely used in the field of machine learning. To the best of our knowledge, the recently proposed edge-dependent vertex weights (EDVW) modeling is one of the most generalized modeling methods of hypergraphs, i.e., most existing hypergraph conceptual modeling methods can be generalized as EDVW hypergraphs without information loss. However, the relevant algorithmic developments on EDVW hypergraphs remain nascent: compared to the spectral theories for graphs, its formulations are incomplete, the spectral clustering algorithms are not well-developed, and the hypergraph Cheeger Inequality is not well-defined. To this end, deriving a unified random walk-based formulation, we propose our definitions of hypergraph Rayleigh Quotient, NCut, boundary/cut, volume, and conductance, which are consistent with the corresponding definitions on graphs. Then, we prove that the normalized hypergraph Laplacian is associated with the NCut value, which inspires our proposed HyperClus-G algorithm for spectral clustering on EDVW hypergraphs. Finally, we prove that HyperClus-G can always find an approximately linearly optimal partitioning in terms of both NCut and conductance. Additionally, we provide extensive experiments to validate our theoretical findings from an empirical perspective. Code of HyperClus-G is available at https://github.com/iDEA-iSAIL-Lab-UIUC/HyperClus-G.
翻译:超图在研究群体关系时自然产生,并已在机器学习领域得到广泛应用。据我们所知,最近提出的边依赖顶点权重建模是超图最广义的建模方法之一,即大多数现有超图概念建模方法均可无损地推广为EDVW超图。然而,EDVW超图的相关算法发展仍处于初级阶段:相较于图的谱理论,其数学表述尚不完整,谱聚类算法尚未完善,且超图Cheeger不等式未有明确定义。为此,通过推导基于随机游走的统一数学表述,我们提出了超图Rayleigh商、NCut、边界/割、体积与电导的定义,这些定义与图上的相应定义保持一致。随后,我们证明了归一化超图拉普拉斯矩阵与NCut值相关联,这启发了我们提出的用于EDVW超图谱聚类的HyperClus-G算法。最后,我们证明了HyperClus-G在NCut和电导度量下总能找到近似线性最优的划分方案。此外,我们通过大量实验从实证角度验证了理论发现。HyperClus-G的代码可在https://github.com/iDEA-iSAIL-Lab-UIUC/HyperClus-G获取。