A hypergraph is a generalization of a graph that arises naturally when attribute-sharing among entities is considered. Although a hypergraph can be converted into a graph by expanding its hyperedges into fully connected subgraphs, going the reverse way is computationally complex and NP-complete. We therefore hypothesize that a hypergraph contains more information than a graph. In addition, it is more convenient to manipulate a hypergraph directly, rather than expand it into a graph. An open problem in hypergraphs is how to accurately and efficiently calculate their node distances. Estimating node distances enables us to find a node's nearest neighbors, and perform label propagation on hypergraphs using a K-nearest neighbors (KNN) approach. In this paper, we propose a novel approach based on random walks to achieve label propagation on hypergraphs. We estimate node distances as the expected hitting times of random walks. We note that simple random walks (SRW) cannot accurately describe highly complex real-world hypergraphs, which motivates us to introduce frustrated random walks (FRW) to better describe them. We further benchmark our method against DeepWalk, and show that while the latter can achieve comparable results, FRW has a distinct computational advantage in cases where the number of targets is fairly small. For such cases, we show that FRW runs in significantly shorter time than DeepWalk. Finally, we analyze the time complexity of our method, and show that for large and sparse hypergraphs, the complexity is approximately linear, rendering it superior to the DeepWalk alternative.
翻译:超图是图的一种推广,当考虑实体间的属性共享时会自然出现。尽管可以通过将超边展开为全连接子图将超图转换为图,但反向操作在计算上复杂且属于NP完全问题。因此我们假设超图包含比图更丰富的信息,并且直接操作超图比将其展开为图更为便捷。超图的一个开放性问题是如何准确高效地计算其节点距离。通过估算节点距离,我们可以找到节点的最近邻,并基于K近邻方法在超图上实现标签传播。本文提出一种基于随机游走实现超图标签传播的新方法,将节点距离估计为随机游走的期望命中时间。我们发现简单随机游走无法准确描述高度复杂的真实超图,这促使我们引入受挫随机游走来更好地刻画此类超图。我们进一步将所提方法与DeepWalk进行基准测试,表明后者虽能达到相当效果,但在目标数量较少时,FRW具有明显的计算优势。针对此类情况,我们证明FRW的运行时间显著短于DeepWalk。最后通过时间复杂度分析表明,对于大规模稀疏超图,该方法的计算复杂度接近线性,使其优于DeepWalk方案。