Consider the unsupervised classification problem in random hypergraphs under the non-uniform \emph{Hypergraph Stochastic Block Model} (HSBM) with two equal-sized communities ($n/2$), where each edge appears independently with some probability depending only on the labels of its vertices. In this paper, an \emph{information-theoretical} threshold for strong consistency is established. Below the threshold, every algorithm would misclassify at least two vertices with high probability, and the expected \emph{mismatch ratio} of the eigenvector estimator is upper bounded by $n$ to the power of minus the threshold. On the other hand, when above the threshold, despite the information loss induced by tensor contraction, one-stage spectral algorithms assign every vertex correctly with high probability when only given the contracted adjacency matrix, even if \emph{semidefinite programming} (SDP) fails in some scenarios. Moreover, strong consistency is achievable by aggregating information from all uniform layers, even if it is impossible when each layer is considered alone. Our conclusions are supported by both theoretical analysis and numerical experiments.
翻译:考虑非均匀超图随机块模型下随机超图的无监督分类问题,其中包含两个规模相等的社区(n/2),每条边以仅依赖于其顶点标签的概率独立出现。本文建立了强一致性的信息论阈值。低于该阈值时,任何算法都会以高概率至少错分两个顶点,且特征向量估计器的期望错分率上限为n的负阈值次幂。另一方面,当高于阈值时,尽管张量收缩导致信息损失,但仅基于收缩邻接矩阵的一阶段谱算法能够以高概率正确分类所有顶点,即使半定规划在某些场景下失效。此外,即使单独考虑每个均匀层时无法实现强一致性,通过聚合所有均匀层的信息也可达成该目标。理论分析与数值实验均支持本文结论。