We study point-to-point distance estimation in hypergraphs, where the query is parameterized by a positive integer s, which defines the required level of overlap for two hyperedges to be considered adjacent. To answer s-distance queries, we first explore an oracle based on the line graph of the given hypergraph and discuss its limitations: the main one is that the line graph is typically orders of magnitude larger than the original hypergraph. We then introduce HypED, a landmark-based oracle with a predefined size, built directly on the hypergraph, thus avoiding constructing the line graph. Our framework allows to approximately answer vertex-to-vertex, vertex-to-hyperedge, and hyperedge-to-hyperedge s-distance queries for any value of s. A key observation at the basis of our framework is that, as s increases, the hypergraph becomes more fragmented. We show how this can be exploited to improve the placement of landmarks, by identifying the s-connected components of the hypergraph. For this task, we devise an efficient algorithm based on the union-find technique and a dynamic inverted index. We experimentally evaluate HypED on several real-world hypergraphs and prove its versatility in answering s-distance queries for different values of s. Our framework allows answering such queries in fractions of a millisecond, while allowing fine-grained control of the trade-off between index size and approximation error at creation time. Finally, we prove the usefulness of the s-distance oracle in two applications, namely, hypergraph-based recommendation and the approximation of the s-closeness centrality of vertices and hyper-edges in the context of protein-to-protein interactions.
翻译:我们研究超图中的点对点距离估计问题,其中查询由一个正整数 s 参数化,该参数定义了超边被视作相邻所需的重叠程度。为回答 s-距离查询,我们首先探索一种基于给定超图线图的预言机,并讨论其局限性:主要问题在于线图通常比原始超图大数个数量级。随后我们提出 HypED,一种具有预定义规模、直接构建于超图上的地标式预言机,从而避免构建线图。我们的框架能够针对任意 s 值,近似回答顶点到顶点、顶点到超边以及超边到超边的 s-距离查询。该框架的核心观察是:随着 s 增加,超图会变得更加碎片化。我们展示了如何利用这一特性,通过识别超图的 s-连通分量来改进地标放置策略。为此,我们设计了一种基于并查集技术和动态倒排索引的高效算法。我们在多个真实世界超图上对 HypED 进行实验评估,并证明其在不同 s 值下回答 s-距离查询的通用性。我们的框架能够在亚毫秒级别回答此类查询,同时允许在创建时对索引规模与近似误差进行细粒度权衡。最后,我们证明了 s-距离预言机在两类应用中的实用性:基于超图的推荐系统,以及在蛋白质相互作用背景下近似顶点和超边的 s-紧密度中心性。